localization sequence analysis: Topics by Science.gov

Sample records for localization sequence analysis

DLocalMotif: a discriminative approach for discovering local motifs in protein sequences.

PubMed

Mehdi, Ahmed M; Sehgal, Muhammad Shoaib B; Kobe, Bostjan; Bailey, Timothy L; Bodén, Mikael

2013-01-01

Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. http://bioinf.scmb.uq.edu.au/dlocalmotif/
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

PubMed

Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

2011-08-30

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
High-speed all-optical DNA local sequence alignment based on a three-dimensional artificial neural network.

PubMed

Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra

2017-07-01

This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
2-D to 3-D global/local finite element analysis of cross-ply composite laminates

NASA Technical Reports Server (NTRS)

Thompson, D. Muheim; Griffin, O. Hayden, Jr.

1990-01-01

An example of two-dimensional to three-dimensional global/local finite element analysis of a laminated composite plate with a hole is presented. The 'zoom' technique of global/local analysis is used, where displacements of the global/local interface from the two-dimensional global model are applied to the edges of the three-dimensional local model. Three different hole diameters, one, three, and six inches, are considered in order to compare the effect of hole size on the three-dimensional stress state around the hole. In addition, three different stacking sequences are analyzed for the six inch hole case in order to study the effect of stacking sequence. The existence of a 'critical' hole size, where the interlaminar stresses are maximum, is indicated. Dispersion of plies at the same angle, as opposed to clustering, is found to reduce the magnitude of some interlaminar stress components and increase others.
Infrared thermal facial image sequence registration analysis and verification

NASA Astrophysics Data System (ADS)

Chen, Chieh-Li; Jian, Bo-Lin

2015-03-01

To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.
Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

PubMed

Vlahovicek, K; Munteanu, M G; Pongor, S

1999-01-01

Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

PubMed Central

2011-01-01

Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Food Fish Identification from DNA Extraction through Sequence Analysis

ERIC Educational Resources Information Center

Hallen-Adams, Heather E.

2015-01-01

This experiment exposed 3rd and 4th y undergraduates and graduate students taking a course in advanced food analysis to DNA extraction, polymerase chain reaction (PCR), and DNA sequence analysis. Students provided their own fish sample, purchased from local grocery stores, and the class as a whole extracted DNA, which was then subjected to PCR,…
Using Phylogenetic Analysis to Detect Market Substitution of Atlantic Salmon for Pacific Salmon: An Introductory Biology Laboratory Experiment

ERIC Educational Resources Information Center

Cline, Erica; Gogarten, Jennifer

2012-01-01

We describe a laboratory exercise developed for the cell and molecular biology quarter of a year-long majors' undergraduate introductory biology sequence. In an analysis of salmon samples collected by students in their local stores and restaurants, DNA sequencing and phylogenetic analysis were used to detect market substitution of Atlantic salmon…
Effectiveness of sodium azide alone compared to sodium azide in combination with methyl nitrosurea for rice mutagenesis

USDA-ARS?s Scientific Manuscript database

Rice seeds of the temperate japonica cultivar Kitaake were mutagenized with sodium azide alone and in combination with methyl nitrosourea. Using the reduced representation sequencing method Restriction Enzyme Sequence Comparative Analysis (RESCAN), the mutation densities, types and local sequence co...
CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

PubMed Central

Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

2014-01-01

CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234
CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

PubMed

Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

2014-01-01

CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.
Uterus segmentation in dynamic MRI using LBP texture descriptors

NASA Astrophysics Data System (ADS)

Namias, R.; Bellemare, M.-E.; Rahim, M.; Pirró, N.

2014-03-01

Pelvic floor disorders cover pathologies of which physiopathology is not well understood. However cases get prevalent with an ageing population. Within the context of a project aiming at modelization of the dynamics of pelvic organs, we have developed an efficient segmentation process. It aims at alleviating the radiologist with a tedious one by one image analysis. From a first contour delineating the uterus-vagina set, the organ border is tracked along a dynamic mri sequence. The process combines movement prediction, local intensity and texture analysis and active contour geometry control. Movement prediction allows a contour intitialization for next image in the sequence. Intensity analysis provides image-based local contour detection enhanced by local binary pattern (lbp) texture descriptors. Geometry control prohibits self intersections and smoothes the contour. Results show the efficiency of the method with images produced in clinical routine.
Transcript variations, phylogenetic tree and chromosomal localization of porcine aryl hydrocarbon receptor (AhR) and AhR nuclear translocator (ARNT) genes.

PubMed

Sadowska, Agnieszka; Paukszto, Lukasz; Nynca, Anna; Szczerbal, Izabela; Orlowska, Karina; Swigonska, Sylwia; Ruszkowska, Monika; Molcan, Tomasz; Jastrzebski, Jan P; Panasiewicz, Grzegorz; Ciereszko, Renata E

2017-03-01

Aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor best known for mediating xenobiotic-induced toxicity. AhR requires aryl hydrocarbon receptor nuclear translocator (ARNT) to form an active transcription complex and promote the activation of genes which have dioxin responsive element in their regulatory regions. The present study was performed to determine the complete cDNA sequences of porcine AhR and ARNT genes and their chromosomal localization. Total RNA from porcine livers were used to obtain the sequence of the entire porcine transcriptome by next-generation sequencing (NGS; lllumina HiSeq2500). In addition, both, in silico analysis and fluorescence in situ hybridization (FISH) were used to determine chromosomal localization of porcine AhR and ARNT genes. In silico analysis of nucleotide sequences showed that there were two transcript variants of AhR and ARNT genes in the pig. In addition, computer analysis revealed that AhR gene in the pig is located on chromosome 9 and ARNT on chromosome 4. The results of FISH experiment confirmed the localization of porcine AhR and ARNT genes. In the present study, for the first time, the full cDNAs of AhR and ARNT were demonstrated in the pig. In future, it would be interesting to determine the tissue distribution of AhR and ARNT transcript variants in the pig and to test whether these variants are associated with different biological functions and/or different activation pathways.
VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

PubMed

Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

2015-07-07

Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Inferring coarse-grain histone-DNA interaction potentials from high-resolution structures of the nucleosome

NASA Astrophysics Data System (ADS)

Meyer, Sam; Everaers, Ralf

2015-02-01

The histone-DNA interaction in the nucleosome is a fundamental mechanism of genomic compaction and regulation, which remains largely unknown despite increasing structural knowledge of the complex. In this paper, we propose a framework for the extraction of a nanoscale histone-DNA force-field from a collection of high-resolution structures, which may be adapted to a larger class of protein-DNA complexes. We applied the procedure to a large crystallographic database extended by snapshots from molecular dynamics simulations. The comparison of the structural models first shows that, at histone-DNA contact sites, the DNA base-pairs are shifted outwards locally, consistent with locally repulsive forces exerted by the histones. The second step shows that the various force profiles of the structures under analysis derive locally from a unique, sequence-independent, quadratic repulsive force-field, while the sequence preferences are entirely due to internal DNA mechanics. We have thus obtained the first knowledge-derived nanoscale interaction potential for histone-DNA in the nucleosome. The conformations obtained by relaxation of nucleosomal DNA with high-affinity sequences in this potential accurately reproduce the experimental values of binding preferences. Finally we address the more generic binding mechanisms relevant to the 80% genomic sequences incorporated in nucleosomes, by computing the conformation of nucleosomal DNA with sequence-averaged properties. This conformation differs from those found in crystals, and the analysis suggests that repulsive histone forces are related to local stretch tension in nucleosomal DNA, mostly between adjacent contact points. This tension could play a role in the stability of the complex.
Local backbone structure prediction of proteins

PubMed Central

De Brevern, Alexandre G.; Benros, Cristina; Gautier, Romain; Valadié, Hélène; Hazout, Serge; Etchebest, Catherine

2004-01-01

Summary A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (φ, Ψ) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically. PMID:15724288
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

PubMed

Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

2011-01-01

The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

PubMed Central

Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

2011-01-01

Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

Mosaic organization of DNA nucleotides

NASA Technical Reports Server (NTRS)

Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.

1994-01-01

Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.
DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

PubMed

Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

2013-07-01

Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
Local-global analysis of crack growth in continuously reinfoced ceramic matrix composites

NASA Technical Reports Server (NTRS)

Ballarini, Roberto; Ahmed, Shamim

1989-01-01

This paper describes the development of a mathematical model for predicting the strength and micromechanical failure characteristics of continuously reinforced ceramic matrix composites. The local-global analysis models the vicinity of a propagating crack tip as a local heterogeneous region (LHR) consisting of spring-like representation of the matrix, fibers and interfaces. Parametric studies are conducted to investigate the effects of LHR size, component properties, and interface conditions on the strength and sequence of the failure processes in the unidirectional composite system.
Lineage divergence detected in the malaria vector Anopheles marajoara (Diptera: Culicidae) in Amazonian Brazil

PubMed Central

2010-01-01

Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara. PMID:20929572
Utility of COX1 phylogenetics to differentiate between locally acquired and imported Plasmodium knowlesi infections in Singapore

PubMed Central

Loh, Jin Phang; Gao, Qiu Han Christine; Lee, Vernon J; Tetteh, Kevin; Drakeley, Chris

2016-01-01

INTRODUCTION Although there have been several phylogenetic studies on Plasmodium knowlesi (P. knowlesi), only cytochrome c oxidase subunit 1 (COX1) gene analysis has shown some geographical differentiation between the isolates of different countries. METHODS Phylogenetic analysis of locally acquired P. knowlesi infections, based on circumsporozoite, small subunit ribosomal ribonucleic acid (SSU rRNA), merozoite surface protein 1 and COX1 gene targets, was performed. The results were compared with the published sequences of regional isolates from Malaysia and Thailand. RESULTS Phylogenetic analysis of the circumsporozoite, SSU rRNA and merozoite surface protein 1 gene sequences for regional P. knowlesi isolates showed no obvious differentiation that could be attributed to their geographical origin. However, COX1 gene analysis showed that it was possible to differentiate between Singapore-acquired P. knowlesi infections and P. knowlesi infections from Peninsular Malaysia and Sarawak, Borneo, Malaysia. CONCLUSION The ability to differentiate between locally acquired P. knowlesi infections and imported P. knowlesi infections has important utility for the monitoring of P. knowlesi malaria control programmes in Singapore. PMID:26805667
Utility of COX1 phylogenetics to differentiate between locally acquired and imported Plasmodium knowlesi infections in Singapore.

PubMed

Loh, Jin Phang; Gao, Qiu Han Christine; Lee, Vernon J; Tetteh, Kevin; Drakeley, Chris

2016-12-01

Although there have been several phylogenetic studies on Plasmodium knowlesi (P. knowlesi), only cytochrome c oxidase subunit 1 (COX1) gene analysis has shown some geographical differentiation between the isolates of different countries. Phylogenetic analysis of locally acquired P. knowlesi infections, based on circumsporozoite, small subunit ribosomal ribonucleic acid (SSU rRNA), merozoite surface protein 1 and COX1 gene targets, was performed. The results were compared with the published sequences of regional isolates from Malaysia and Thailand. Phylogenetic analysis of the circumsporozoite, SSU rRNA and merozoite surface protein 1 gene sequences for regional P. knowlesi isolates showed no obvious differentiation that could be attributed to their geographical origin. However, COX1 gene analysis showed that it was possible to differentiate between Singapore-acquired P. knowlesi infections and P. knowlesi infections from Peninsular Malaysia and Sarawak, Borneo, Malaysia. The ability to differentiate between locally acquired P. knowlesi infections and imported P. knowlesi infections has important utility for the monitoring of P. knowlesi malaria control programmes in Singapore. Copyright: © Singapore Medical Association
Randomizing world trade. II. A weighted network analysis

NASA Astrophysics Data System (ADS)

Squartini, Tiziano; Fagiolo, Giorgio; Garlaschelli, Diego

2011-10-01

Based on the misleading expectation that weighted network properties always offer a more complete description than purely topological ones, current economic models of the International Trade Network (ITN) generally aim at explaining local weighted properties, not local binary ones. Here we complement our analysis of the binary projections of the ITN by considering its weighted representations. We show that, unlike the binary case, all possible weighted representations of the ITN (directed and undirected, aggregated and disaggregated) cannot be traced back to local country-specific properties, which are therefore of limited informativeness. Our two papers show that traditional macroeconomic approaches systematically fail to capture the key properties of the ITN. In the binary case, they do not focus on the degree sequence and hence cannot characterize or replicate higher-order properties. In the weighted case, they generally focus on the strength sequence, but the knowledge of the latter is not enough in order to understand or reproduce indirect effects.
Localization of human coagulation factor VIII (hFVIII) in transgenic rabbit by FISH-TSA: identification of transgene copy number and transmission to the next generation.

PubMed

Krylov, V; Tlapáková, T; Mácha, J; Curlej, J; Ryban, L; Chrenek, P

2008-01-01

For chromosomal localization of the hFVIII human transgene in F2 and F3 generation of transgenic rabbits, FISH-TSA was applied. A short cDNA probe (1250 bp) targeted chromosomes 3, 7, 8, 9 and 18 of an F2 male (animal 1-3-8). Two transgenic offspring (F3) revealed signal positions in chromosome 3 and chromosomes 3 and 7, respectively. Sequencing and structure analysis of the rabbit orthologous gene revealed high similarity to its human counterpart. Part of the sequenced cDNA (1310 bp) served as a probe for FISH-TSA analysis. The rabbit gene was localized in the q arm terminus of the X chromosome. This result is in agreement with reciprocal chromosome painting between the rabbit and the human. The presented FISH-TSA method provides strong signals without any interspecies reactivity.
Analysis of time in establishing synchronization radio communication system with expanded spectrum conditions for communication with mobile robots

NASA Astrophysics Data System (ADS)

Latinovic, T. S.; Kalabic, S. B.; Barz, C. R.; Petrica, P. Paul; Pop-Vădean, A.

2018-01-01

This paper analyzes the influence of the Doppler Effect on the length of time to establish synchronization pseudorandom sequences in radio communications systems with an expanded spectrum. Also, this paper explores the possibility of using secure wireless communication for modular robots. Wireless communication could be used for local and global communication. We analyzed a radio communication system integrator, including the two effects of the Doppler signal on the duration of establishing synchronization of the received and locally generated pseudorandom sequence. The effects of the impact of the variability of the phase were analyzed between the said sequences and correspondence of the phases of these signals with the interval of time of acquisition of received sequences. An analysis of these impacts is essential in the transmission of signal and protection of the transfer of information in the communication systems with an expanded range (telecommunications, mobile telephony, Global Navigation Satellite System GNSS, and wireless communication). Results show that wireless communication can provide a safety approach for communication with mobile robots.
Morphological and genetic evidence of contemporary intersectional hybridisation in Mediterranean Helichrysum (Asteraceae, Gnaphalieae).

PubMed

Galbany-Casals, M; Carnicero-Campmany, P; Blanco-Moreno, J M; Smissen, R D

2012-09-01

Hybridisation is considered an important evolutionary phenomenon in Gnaphalieae, but contemporary hybridisation has been little explored within the tribe. Here, hybridisation between Helichrysum orientale and Helichrysum stoechas is studied at two different localities in the islands of Crete and Rhodes (Greece). Using three different types of molecular data (AFLP, nrDNA ITS sequences and cpDNA ndhF sequences) and morphological data, the aim is to provide simultaneous and direct comparisons between molecular and morphological variation among the parental species and the studied hybrid populations. AFLP profiles, ITS sequences and morphological data support the existence of hybrids at the two localities studied, shown as morphological and genetic intermediates between the parental species. Chloroplast DNA sequences show that both parental species can act either as pollen donor or as maternal parent. Fertility of hybrids is demonstrated by the viability of seeds produced by hybrids from both localities, and the detection of a backcross specimen to H. orientale. Although there is general congruence of morphological and molecular data, the analysis of morphology and ITS sequences can fail to detect backcross hybrids. © 2012 German Botanical Society and The Royal Botanical Society of the Netherlands.
Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

PubMed

Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

2015-09-21

Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Dynamic assessment of microbial ecology (DAME): a web app for interactive analysis and visualization of microbial sequencing data.

PubMed

Piccolo, Brian D; Wankhade, Umesh D; Chintapalli, Sree V; Bhattacharyya, Sudeepa; Chunqiao, Luo; Shankar, Kartik

2018-03-15

Dynamic assessment of microbial ecology (DAME) is a Shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequencing data analyses. Currently, DAME supports group comparisons of several ecological estimates of α-diversity and β-diversity, along with differential abundance analysis of individual taxa. Using the Shiny framework, the user has complete control of all aspects of the data analysis, including sample/experimental group selection and filtering, estimate selection, statistical methods and visualization parameters. Furthermore, graphical and tabular outputs are supported by R packages using D3.js and are fully interactive. DAME was implemented in R but can be modified by Hypertext Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript. It is freely available on the web at https://acnc-shinyapps.shinyapps.io/DAME/. Local installation and source code are available through Github (https://github.com/bdpiccolo/ACNC-DAME). Any system with R can launch DAME locally provided the shiny package is installed. bdpiccolo@uams.edu.
Effects of the Laramide Structures on the Regional Distribution of Tight-Gas Sandstone in the Upper Mesaverde Group, Uinta Basin, Utah

NASA Astrophysics Data System (ADS)

Sitaula, R. P.; Aschoff, J.

2013-12-01

Regional-scale sequence stratigraphic correlation, well log analysis, syntectonic unconformity mapping, isopach maps, and depositional environment maps of the upper Mesaverde Group (UMG) in Uinta basin, Utah suggest higher accommodation in northeastern part (Natural Buttes area) and local development of lacustrine facies due to increased subsidence caused by uplift of San Rafael Swell (SRS) in southern and Uinta Uplift in northern parts. Recently discovered lacustrine facies in Natural Buttes area are completely different than the dominant fluvial facies in outcrops along Book Cliffs and could have implications for significant amount of tight-gas sand production from this area. Data used for sequence stratigraphic correlation, isopach maps and depositional environmental maps include > 100 well logs, 20 stratigraphic profiles, 35 sandstone thin sections and 10 outcrop-based gamma ray profiles. Seven 4th order depositional sequences (~0.5 my duration) are identified and correlated within UMG. Correlation was constructed using a combination of fluvial facies and stacking patterns in outcrops, chert-pebble conglomerates and tidally influenced strata. These surfaces were extrapolated into subsurface by matching GR profiles. GR well logs and core log of Natural Buttes area show intervals of coarsening upward patterns suggesting possible lacustrine intervals that might contain high TOC. Locally, younger sequences are completely truncated across SRS whereas older sequences are truncated and thinned toward SRS. The cycles of truncation and thinning represent phases of SRS uplift. Thinning possibly related with the Uinta Uplift is also observed in northwestern part. Paleocurrents are consistent with interpretation of periodic segmentation and deflection of sedimentation. Regional paleocurrents are generally E-NE-directed in Sequences 1-4, and N-directed in Sequences 5-7. From isopach maps and paleocurrent direction it can be interpreted that uplift of SRS changed route of sediment supply from west to southwest. Locally, paleocurrents are highly variable near SRS further suggesting UMG basin-fill was partitioned by uplift of SRS. Sandstone composition analysis also suggests the uplift of SRS causing the variation of source rocks in upper sequences than the lower sequences. In conclusion, we suggest that Uinta basin was episodically partitioned during the deposition of UMG due to uplift of Laramide structures in the basin and accommodation was localized in northeastern part. Understanding of structural controls on accommodation, sedimentation patterns and depositional environments will aid prediction of the best-producing gas reservoirs.
Sirius PSB: a generic system for analysis of biological sequences.

PubMed

Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

2009-12-01

Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.
A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

PubMed Central

Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

2016-01-01

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543
Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tuskan, Gerald A; Gunter, Lee E; DiFazio, Stephen P

The 18S-28S rDNA and 5S rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 18S-28S rDNA sites and one 5S rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis -type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones selected from 2 linkage groups based on genome sequence assembly (LG-I and LG-VI) were localized on 2 chromosomes, as expected. BACs from LG-I hybridized to the longest chromosome in the complement. All BAC positions were found to be concordant with sequencemore » assembly positions. BAC-FISH will be useful for delineating each of the Populus trichocarpa chromosomes and improving the sequence assembly of this model angiosperm tree species.« less
Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

PubMed Central

Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

2017-01-01

Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610
Overlapping local and long-range RNA-RNA interactions modulate dengue virus genome cyclization and replication.

PubMed

de Borba, Luana; Villordo, Sergio M; Iglesias, Nestor G; Filomatori, Claudia V; Gebhard, Leopoldo G; Gamarnik, Andrea V

2015-03-01

The dengue virus genome is a dynamic molecule that adopts different conformations in the infected cell. Here, using RNA folding predictions, chemical probing analysis, RNA binding assays, and functional studies, we identified new cis-acting elements present in the capsid coding sequence that facilitate cyclization of the viral RNA by hybridization with a sequence involved in a local dumbbell structure at the viral 3' untranslated region (UTR). The identified interaction differentially enhances viral replication in mosquito and mammalian cells. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Information theory applications for biological sequence analysis.

PubMed

Vinga, Susana

2014-05-01

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Transcriptome analysis in Concholepas concholepas (Gastropoda, Muricidae): mining and characterization of new genomic and molecular markers.

PubMed

Cárdenas, Leyla; Sánchez, Roland; Gomez, Daniela; Fuenzalida, Gonzalo; Gallardo-Escárate, Cristián; Tanguy, Arnaud

2011-09-01

The marine gastropod Concholepas concholepas, locally known as the "loco", is the main target species of the benthonic Chilean fisheries. Genetic and genomic tools are necessary to study the genome of this species in order to understand the molecular basis of its development, growth, and other key traits to improve the management strategies and to identify local adaptation to prevent loss of biodiversity. Here, we use pyrosequencing technologies to generate the first transcriptomic database from adult specimens of the loco. After trimming, a total of 140,756 Expressed Sequence Tag sequences were achieved. Clustering and assembly analysis identified 19,219 contigs and 105,435 singleton sequences. BlastN analysis showed a significant identity with Expressed Sequence Tags of different gastropod species available in public databases. Similarly, BlastX results showed that only 895 out of the total 124,654 had significant hits and may represent novel genes for marine gastropods. From this database, simple sequence repeat motifs were also identified and a total of 38 primer pairs were designed and tested to assess their potential as informative markers and to investigate their cross-species amplification in different related gastropod species. This dataset represents the first publicly available 454 data for a marine gastropod endemic to the southeastern Pacific coast, providing a valuable transcriptomic resource for future efforts of gene discovery and development of functional markers in other marine gastropods. Copyright © 2011 Elsevier B.V. All rights reserved.

Topological frustration in βα-repeat proteins: sequence diversity modulates the conserved folding mechanisms of α/β/α sandwich proteins

PubMed Central

Hills, Ronald D.; Kathuria, Sagar V.; Wallace, Louise A.; Day, Iain J.; Brooks, Charles L.; Matthews, C. Robert

2010-01-01

The thermodynamic hypothesis of Anfinsen postulates that structures and stabilities of globular proteins are determined by their amino acid sequences. Chain topology, however, is known to influence the folding reaction, in that motifs with a preponderance of local interactions typically fold more rapidly than those with a larger fraction of non-local interactions. Together, the topology and sequence can modulate the energy landscape and influence the rate at which the protein folds to the native conformation. To explore the relationship of sequence and topology in the folding of βα–repeat proteins, which are dominated by local interactions, a combined experimental and simulation analysis was performed on two members of the flavodoxin-like, α/β/α sandwich fold. Spo0F and the N-terminal receiver domain of NtrC (NT-NtrC) have similar topologies but low sequence identity, enabling a test of the effects of sequence on folding. Experimental results demonstrated that both response-regulator proteins fold via parallel channels through highly structured sub-millisecond intermediates before accessing their cis prolyl peptide bond-containing native conformations. Global analysis of the experimental results preferentially places these intermediates off the productive folding pathway. Sequence-sensitive Gō-model simulations conclude that frustration in the folding in Spo0F, corresponding to the appearance of the off-pathway intermediate, reflects competition for intra-subdomain van der Waals contacts between its N- and C-terminal subdomains. The extent of transient, premature structure appears to correlate with the number of isoleucine, leucine and valine (ILV) side-chains that form a large sequence-local cluster involving the central β-sheet and helices α2, α3 and α4. The failure to detect the off-pathway species in the simulations of NT-NtrC may reflect the reduced number of ILV side-chains in its corresponding hydrophobic cluster. The location of the hydrophobic clusters in the structure may also be related to the differing functional properties of these response regulators. Comparison with the results of previous experimental and simulation analyses on the homologous CheY argues that prematurely-folded unproductive intermediates are a common property of the βα-repeat motif. PMID:20226790
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Complete genome sequence and integrated protein localization and interaction map for alfalfa dwarf virus, which combines properties of both cytoplasmic and nuclear plant rhabdoviruses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bejerman, Nicolás, E-mail: n.bejerman@uq.edu.au; Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia, QLD 4072; Giolitti, Fabián

Summary: We have determined the full-length 14,491-nucleotide genome sequence of a new plant rhabdovirus, alfalfa dwarf virus (ADV). Seven open reading frames (ORFs) were identified in the antigenomic orientation of the negative-sense, single-stranded viral RNA, in the order 3′-N-P-P3-M-G-P6-L-5′. The ORFs are separated by conserved intergenic regions and the genome coding region is flanked by complementary 3′ leader and 5′ trailer sequences. Phylogenetic analysis of the nucleoprotein amino acid sequence indicated that this alfalfa-infecting rhabdovirus is related to viruses in the genus Cytorhabdovirus. When transiently expressed as GFP fusions in Nicotiana benthamiana leaves, most ADV proteins accumulated in the cellmore » periphery, but unexpectedly P protein was localized exclusively in the nucleus. ADV P protein was shown to have a homotypic, and heterotypic nuclear interactions with N, P3 and M proteins by bimolecular fluorescence complementation. ADV appears unique in that it combines properties of both cytoplasmic and nuclear plant rhabdoviruses. - Highlights: • The complete genome of alfalfa dwarf virus is obtained. • An integrated localization and interaction map for ADV is determined. • ADV has a genome sequence similarity and evolutionary links with cytorhabdoviruses. • ADV protein localization and interaction data show an association with the nucleus. • ADV combines properties of both cytoplasmic and nuclear plant rhabdoviruses.« less
CDC Vital Signs: Preventing Norovirus Outbreaks

MedlinePlus

... source of norovirus outbreaks using genome sequencing and analysis. State and local governments can Adopt and enforce all provisions of the FDA model Food Code to better safeguard food. Investigate norovirus outbreaks ...
Development of a monoclonal anitbody to immuno-cytochemical analysis of the cellular localization of the peripheral benzodiazepine receptor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dussossoy, D.; Carayon, P.; Feraut, D.

1996-05-01

Based on the amino acid sequence deduced from the cloned human peripheral benzodiazepine receptor (PBR) gene, monoclonal antibody (Mab 8D7) was produced against the C-terminal fragment of the receptor. Immunoblot experiments, performed against purified PBR, indicated that the antipeptide antibody recognized, under denaturing conditions, the corresponding amino acid sequence of the PBR. When mitochondrial membranes form PBR transfected yeast or from THP1 and U937 cells were used on immunoblot analysis, a high level of immunoreactivity was observed at 18 kDa, the PBR molecular mass deduced from cDNA, establishing the specificity of the antibody for the receptor. Moreover, binding experiments realizedmore » with intact mitochondria demonstrated that the immunogenic sequence was accessible to the antibody indicating that the C-terminal fragment of the PBR faces the cytosol. Using this Mab we developed a technique which allowed precise quantification of PBR density per cell. Furthermore, cellular localization studies by flow cytometric analysis and confocal microscopy on cell lines displaying different levels of PBR showed that Mab 8D7 was entirely colocalized with an antimitochondria Mab. 34 refs., 7 figs.« less
Identification of two internal signal peptide sequences: critical for classical swine fever virus non-structural protein 2 to trans-localize to the endoplasmic reticulum.

PubMed

Guo, Kang-kang; Tang, Qing-hai; Zhang, Yan-ming; Kang, Kai; He, Lei

2011-05-18

The membrane topology and molecular mechanisms for endoplasmic reticulum (ER) localization of classical swine fever virus (CSFV) non-structural 2 (NS2) protien is unclear. We attempted to elucidate the subcellular localization, and the molecular mechanisms responsible for the localization of this protein in our study. The NS2 gene was amplified by reverse transcription polymerase chain reaction, with the transmembrane region and hydrophilicity of the NS2 protein was predicted by bioinformatics analysis. Twelve cDNAs of the NS2 gene were amplified by the PCR deletion method and cloned into a eukaryotic expression vector, which was transfected into a swine umbilical vein endothelial cell line (SUVEC). Subcellular localization of the NS2 protein was characterized by confocal microscopy, and western blots were carried out to analyze protein expression. Our results showed that the -NH2 terminal of the CSFV NS2 protein was highly hydrophobic and the protein localized in the ER. At least four transmembrane regions and two internal signal peptide sequences (amino acids103-138 and 220-262) were identified and thought to be critical for its trans-localization to the ER. This is the first study to identify the internal signal peptide sequences of the CSFV NS2 protein and its subcellular localization, providing the foundation for further exploration of this protein's function of this protein and its role in CSFV pathogenesis.
Analysis of the regulatory region of the protease III (ptr) gene of Escherichia coli K-12.

PubMed

Claverie-Martin, F; Diaz-Torres, M R; Kushner, S R

1987-01-01

The ptr gene of Escherichia coli encodes protease III (Mr 110,000) and a 50-kDa polypeptide, both of which are found in the periplasmic space. The gene is physically located between the recC and recB loci on the E. coli chromosome. The nucleotide sequence of a 1167-bp EcoRV-ClaI fragment of chromosomal DNA containing the promoter region and 885 bp of the ptr coding sequence has been determined. S1 nuclease mapping analysis showed that the major 5' end of the ptr mRNA was localized 127 bp upstream from the ATG start codon. The open reading frame (ORF), preceded by a Shine-Dalgarno sequence, extends to the end of the sequenced DNA. Downstream from the -35 and -10 regions is a sequence that strongly fits the consensus sequence of known nitrogen-regulated promoters. A signal peptide of 23 amino acids residues is present at the N terminus of the derived amino acid sequence. The cleavage site as well as the ORF were confirmed by sequencing the N terminus of mature protease III.
Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information.

PubMed

Song, Jiangning; Burrage, Kevin; Yuan, Zheng; Huber, Thomas

2006-03-09

The majority of peptide bonds in proteins are found to occur in the trans conformation. However, for proline residues, a considerable fraction of Prolyl peptide bonds adopt the cis form. Proline cis/trans isomerization is known to play a critical role in protein folding, splicing, cell signaling and transmembrane active transport. Accurate prediction of proline cis/trans isomerization in proteins would have many important applications towards the understanding of protein structure and function. In this paper, we propose a new approach to predict the proline cis/trans isomerization in proteins using support vector machine (SVM). The preliminary results indicated that using Radial Basis Function (RBF) kernels could lead to better prediction performance than that of polynomial and linear kernel functions. We used single sequence information of different local window sizes, amino acid compositions of different local sequences, multiple sequence alignment obtained from PSI-BLAST and the secondary structure information predicted by PSIPRED. We explored these different sequence encoding schemes in order to investigate their effects on the prediction performance. The training and testing of this approach was performed on a newly enlarged dataset of 2424 non-homologous proteins determined by X-Ray diffraction method using 5-fold cross-validation. Selecting the window size 11 provided the best performance for determining the proline cis/trans isomerization based on the single amino acid sequence. It was found that using multiple sequence alignments in the form of PSI-BLAST profiles could significantly improve the prediction performance, the prediction accuracy increased from 62.8% with single sequence to 69.8% and Matthews Correlation Coefficient (MCC) improved from 0.26 with single local sequence to 0.40. Furthermore, if coupled with the predicted secondary structure information by PSIPRED, our method yielded a prediction accuracy of 71.5% and MCC of 0.43, 9% and 0.17 higher than the accuracy achieved based on the singe sequence information, respectively. A new method has been developed to predict the proline cis/trans isomerization in proteins based on support vector machine, which used the single amino acid sequence with different local window sizes, the amino acid compositions of local sequence flanking centered proline residues, the position-specific scoring matrices (PSSMs) extracted by PSI-BLAST and the predicted secondary structures generated by PSIPRED. The successful application of SVM approach in this study reinforced that SVM is a powerful tool in predicting proline cis/trans isomerization in proteins and biological sequence analysis.
A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

PubMed

Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

2007-04-01

Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
Deciphering the shape and deformation of secondary structures through local conformation analysis

PubMed Central

2011-01-01

Background Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Results Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. Conclusion The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons. PMID:21284872
Deciphering the shape and deformation of secondary structures through local conformation analysis.

PubMed

Baussand, Julie; Camproux, Anne-Claude

2011-02-01

Protein deformation has been extensively analysed through global methods based on RMSD, torsion angles and Principal Components Analysis calculations. Here we use a local approach, able to distinguish among the different backbone conformations within loops, α-helices and β-strands, to address the question of secondary structures' shape variation within proteins and deformation at interface upon complexation. Using a structural alphabet, we translated the 3 D structures of large sets of protein-protein complexes into sequences of structural letters. The shape of the secondary structures can be assessed by the structural letters that modeled them in the structural sequences. The distribution analysis of the structural letters in the three protein compartments (surface, core and interface) reveals that secondary structures tend to adopt preferential conformations that differ among the compartments. The local description of secondary structures highlights that curved conformations are preferred on the surface while straight ones are preferred in the core. Interfaces display a mixture of local conformations either preferred in core or surface. The analysis of the structural letters transition occurring between protein-bound and unbound conformations shows that the deformation of secondary structure is tightly linked to the compartment preference of the local conformations. The conformation of secondary structures can be further analysed and detailed thanks to a structural alphabet which allows a better description of protein surface, core and interface in terms of secondary structures' shape and deformation. Induced-fit modification tendencies described here should be valuable information to identify and characterize regions under strong structural constraints for functional reasons.
The Genetic Diversity, Haplotype Analysis, and Phylogenetic Relationship of Aedes albopictus (Diptera: Culicidae) Based on the Cytochrome Oxidase 1 Marker: A Malaysian Scenario.

PubMed

Ismail, Nurul-Ain; Adilah-Amrannudin, Nurul; Hamsidi, Mayamin; Ismail, Rodziah; Dom, Nazri Che; Ahmad, Abu Hassan; Mastuki, Mohd Fahmi; Camalxaman, Siti Nazrina

2017-11-07

The global expansion of Ae. albopictus from its native range in Southeast Asia has been implicated in the recent emergence of dengue endemicity in Malaysia. Genetic variability studies of Ae. albopictus are currently lacking in the Malaysian setting, yet are crucial to enhancing the existing vector control strategies. The study was conducted to establish the genetic variability of maternally inherited mitochondrial DNA encoding for cytochrome oxidase subunit 1 (CO1) gene in Ae. albopictus. Twelve localities were selected in the Subang Jaya district based on temporal indices utilizing 120 mosquito samples. Genetic polymorphism and phylogenetic analysis were conducted to unveil the genetic variability and geographic origins of Ae. albopictus. The haplotype network was mapped to determine the genealogical relationship of sequences among groups of population in the Asian region. Comparison of Malaysian CO1 sequences with sequences derived from five Asian countries revealed genetically distinct Ae. albopictus populations. Phylogenetic analysis revealed that all sequences from other Asian countries descended from the same genetic lineage as the Malaysian sequences. Noteworthy, our study highlights the discovery of 20 novel haplotypes within the Malaysian population which to date had not been reported. These findings could help determine the genetic variation of this invasive species, which in turn could possibly improve the current dengue vector surveillance strategies, locally and regionally. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Chromosomal localization and sequence analysis of a human episomal sequence with in vitro differentiating activity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boccaccio, C.; Deshatrette, J.; Meunier-Rotival, M.

1994-05-01

The genomic fragment carrying the human activator of liver function, previously described as an episome capable of inducing differentiation upon transfection into a dedifferentiated rat hepatoma cell line, was mapped on human chromosome 12q24.2-12q24.3. This chromosomal location was indistinguishable by in situ hybridization from that of the gene coding for the hepatic transcription factor HNF1. The sequence of the integrated form of the episome as well as its flanking sequences show that it is rich in retroposons. It contains a human ribosomal protein L21 processed pseudogene, one truncated L1Hs sequence, and 10 Alu repeats, which belong to different subfamilies.
BnNHL18A shows a localization change by stress-inducing chemical treatments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Suk-Bae; Ham, Byung-Kook; Park, Jeong Mee

2006-01-06

The two genes, named BnNHL18A and BnNHL18B, showing sequence homology with Arabidopsis NDR1/HIN1-like (NHL) genes, were isolated from cDNA library prepared with oilseed rape (Brassica napus) seedlings treated with NaCl. The transcript level of BnNHL18A was increased by sodium chloride, ethephon, hydrogen peroxide, methyl jasmonate, or salicylic acid treatment. The coding regions of BnNHL18A and BnNHL18B contain a sarcolipin (SLN)-like sequence. Analysis of the localization of smGFP fusion proteins showed that BnNHL18A is mainly localized to endoplasmic reticulum (ER). This result suggests that the SLN-like sequence plays a role in retaining proteins in ER membrane in plants. In response tomore » NaCl, hydrogen peroxide, ethephon, and salicylic acid treatments, the protein localization of BnNHL18A was changed. Our findings suggest a common function of BnNHL18A in biotic and abiotic stresses, and demonstrate the presence of the shared mechanism of protein translocalization between the responses to plant pathogen and to osmotic stress.« less
Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization.

PubMed

Michalovova, M; Vyskot, B; Kejnovsky, E

2013-10-01

We analysed the size, relative age and chromosomal localization of nuclear sequences of plastid and mitochondrial origin (NUPTs-nuclear plastid DNA and NUMTs-nuclear mitochondrial DNA) in six completely sequenced plant species. We found that the largest insertions showed lower divergence from organelle DNA than shorter insertions in all species, indicating their recent origin. The largest NUPT and NUMT insertions were localized in the vicinity of the centromeres in the small genomes of Arabidopsis and rice. They were also present in other chromosomal regions in the large genomes of soybean and maize. Localization of NUPTs and NUMTs correlated positively with distribution of transposable elements (TEs) in Arabidopsis and sorghum, negatively in grapevine and soybean, and did not correlate in rice or maize. We propose a model where new plastid and mitochondrial DNA sequences are inserted close to centromeres and are later fragmented by TE insertions and reshuffled away from the centromere or removed by ectopic recombination. The mode and tempo of TE dynamism determines the turnover of NUPTs and NUMTs resulting in their species-specific chromosomal distributions.
Automatic segmentation of low-visibility moving objects through energy analyis of the local 3D spectrum

NASA Astrophysics Data System (ADS)

Nestares, Oscar; Miravet, Carlos; Santamaria, Javier; Fonolla Navarro, Rafael

1999-05-01

Automatic object segmentation in highly noisy image sequences, composed by a translating object over a background having a different motion, is achieved through joint motion-texture analysis. Local motion and/or texture is characterized by the energy of the local spatio-temporal spectrum, as different textures undergoing different translational motions display distinctive features in their 3D (x,y,t) spectra. Measurements of local spectrum energy are obtained using a bank of directional 3rd order Gaussian derivative filters in a multiresolution pyramid in space- time (10 directions, 3 resolution levels). These 30 energy measurements form a feature vector describing texture-motion for every pixel in the sequence. To improve discrimination capability and reduce computational cost, we automatically select those 4 features (channels) that best discriminate object from background, under the assumptions that the object is smaller than the background and has a different velocity or texture. In this way we reject features irrelevant or dominated by noise, that could yield wrong segmentation results. This method has been successfully applied to sequences with extremely low visibility and for objects that are even invisible for the eye in absence of motion.
Classifying Facial Actions

PubMed Central

Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.

2010-01-01

The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284
Functional analysis of the C-terminal region of human adenovirus E1A reveals a misidentified nuclear localization signal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, Michael J.; King, Cason R.; Dikeakos, Jimmy D.

The immortalizing function of the human adenovirus 5 E1A oncoprotein requires efficient localization to the nucleus. In 1987, a consensus monopartite nuclear localization sequence (NLS) was identified at the C-terminus of E1A. Since that time, various experiments have suggested that other regions of E1A influence nuclear import. In addition, a novel bipartite NLS was recently predicted at the C-terminal region of E1A in silico. In this study, we used immunofluorescence microscopy and co-immunoprecipitation analysis with importin-α to verify that full nuclear localization of E1A requires the well characterized NLS spanning residues 285–289, as well as a second basic patch situatedmore » between residues 258 and 263 ({sup 258}RVGGRRQAVECIEDLLNEPGQPLDLSCKRPRP{sup 289}). Thus, the originally described NLS located at the C-terminus of E1A is actually a bipartite signal, which had been misidentified in the existing literature as a monopartite signal, altering our understanding of one of the oldest documented NLSs. - Highlights: • Human adenovirus E1A is localized to the nucleus. • The C-terminus of E1A contains a bipartite nuclear localization signal (NLS). • This signal was previously misidentified to be a monopartite NLS. • Key basic amino acid residues within this sequence are highly conserved.« less
Structural analysis of the HLA-A/HLA-F subregion: Precise localization of two new multigene families closely associated with the HLA class I sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pichon, L.; Carn, G.; Bouric, P.

1996-03-01

Positional cloning strategies for the hemochromatosis gene have previously concentrated on a target area restricted to a maximum genomic expanse of 400 kb around the HLA-A and HLA-F loci. Recently, the candidate region has been extended to 2-3 Mb on the distal side of the MHC. In this study, 10 coding sequences [hemochromatosis candidate genes (HCG) I to X] were isolated by cDNA selection using YACs covering the HLA-A/HLA-F subregion. Two of these (HCG II and HCG IV) belong to multigene families, as well as other sequences already described in this region, i.e., P5, pMC 6.7, and HLA class I.more » Fingerprinting of the four YACSs overlapping the region was performed and allowed partial localization of the different multigene family sequences on each YAC without defining their exact positions. Fingerprinting on cosmids isolated from the ICRF chromosome 6-specific cosmid library allowed more precise localization of the redundant sequences in all of the multigene families and revealed their apparent organization in clusters. Further examination of these intertwined sequences demonstrated that this structural organization resulted from a succession of complex phenomena, including duplications and contractions. This study presents a precise description of the structural organization of the HLA-A/HLA-F region and a determination of the sequences involved in the megabase size polymorphism observed among the A3, A24, and A31 haplotypes. 29 refs., 2 figs., 2 tabs.« less

Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services

PubMed Central

Madduri, Ravi K.; Sulakhe, Dinanath; Lacinski, Lukasz; Liu, Bo; Rodriguez, Alex; Chard, Kyle; Dave, Utpal J.; Foster, Ian T.

2014-01-01

We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads. PMID:25342933
Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services.

PubMed

Madduri, Ravi K; Sulakhe, Dinanath; Lacinski, Lukasz; Liu, Bo; Rodriguez, Alex; Chard, Kyle; Dave, Utpal J; Foster, Ian T

2014-09-10

We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads.
Net Metering | State, Local, and Tribal Governments | NREL

Science.gov Websites

research organizations have explored this question by conducting solar cost-benefit studies. Program Design Sequencing for State Distributed PV Policies: A Quantitative Analysis of Policy Impacts and Interactions
Shotgun metagenomic data streams: surfing without fear

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berendzen, Joel R

2010-12-06

Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomicmore » sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.« less
Cytogenetic Analysis of Populus trichocarpa - Ribosomal DNA, Telomere Repeat Sequence, and Marker-selected BACs

Treesearch

M.N. lslam-Faridi; C.D. Nelson; S.P. DiFazio; L.E. Gunter; G.A. Tuskan

2009-01-01

The 185-285 rDNA and 55 rDNA loci in Populus trichocarpa were localized using fluorescent in situ hybridization (FISH). Two 185-285 rDNA sites and one 55 rDNA site were identified and located at the ends of 3 different chromosomes. FISH signals from the Arabidopsis-type telomere repeat sequence were observed at the distal ends of each chromosome. Six BAC clones...
PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods

PubMed Central

2012-01-01

Background With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net. PMID:22568821
Modeling photo-bleaching kinetics to map local variations in rod rhodopsin density

NASA Astrophysics Data System (ADS)

Ehler, M.; Dobrosotskaya, J.; King, E. J.; Czaja, W.; Bonner, R. F.

2011-03-01

Localized rod photoreceptor and rhodopsin losses have been observed in post mortem histology both in normal aging and in age-related maculopathy. We propose to noninvasively map local rod rhodopsin density through analysis of the brightening of the underlying lipofuscin autofluorescence (LAF) in confocal scanning laser ophthalmoscopy (cSLO) imaging sequences starting in the dark adapted eye. The detected LAF increases as rhodopsin is bleached (time constant ~ 25sec) by the average retinal irradiance of the cSLO 488nm laser beam. We fit parameters of analytical expressions for the kinetics of rhodopsin bleaching that Lamb validated using electroretinogram recordings in human. By performing localized (~ 100μm) kinetic analysis, we create high resolution maps of the rhodopsin density. This new noninvasive imaging and analysis approach appears well-suited for measuring localized changes in the rod photoreceptors and correlating them at high spatial resolution with localized pathological changes of the retinal pigment epithelium (RPE) seen in steady-state LAF images.
Association mining of dependency between time series

NASA Astrophysics Data System (ADS)

Hafez, Alaaeldin

2001-03-01

Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.
Complete genome sequence of Enterobacter sp. IIT-BT 08: A potential microbial strain for high rate hydrogen production.

PubMed

Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata

2013-12-20

Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.
Genome wide assessment of mRNA in astrocyte protrusions by direct RNA sequencing reveals mRNA localization for the intermediate filament protein nestin.

PubMed

Thomsen, Rune; Pallesen, Jonatan; Daugaard, Tina F; Børglum, Anders D; Nielsen, Anders L

2013-11-01

Subcellular RNA localization plays an important role in development, cell differentiation, and cell migration. For a comprehensive description of the population of protrusion localized mRNAs in astrocytes we separated protrusions from cell bodies in a Boyden chamber and performed high-throughput direct RNA sequencing. The mRNAs with localization in astrocyte protrusions encode proteins belonging to a variety of functional groups indicating involvement of RNA localization for a palette of cellular functions. The mRNA encoding the intermediate filament protein Nestin was among the identified mRNAs. By RT-qPCR and RNA FISH analysis we confirmed Nestin mRNA localization in cell protrusions and also protrusion localization of Nestin protein. Nestin mRNA localization was dependent of Fragile X mental retardation syndrome proteins Fmrp and Fxr1, and the Nestin 3'-UTR was sufficient to mediate protrusion mRNA localization. The mRNAs for two other intermediate filament proteins in astrocytes, Gfap and Vimentin, have moderate and no protrusion localization, respectively, showing that individual intermediate filament components have different localization mechanisms. The correlated localization of Nestin mRNA with Nestin protein in cell protrusions indicates the presence of a regulatory mechanism at the mRNA localization level for the Nestin intermediate filament protein with potential importance for astrocyte functions during brain development and maintenance. Copyright © 2013 Wiley Periodicals, Inc.
Chromosomal localization and partial genomic structure of the human peroxisome proliferator activated receptor-gamma (hPPAR gamma) gene.

PubMed

Beamer, B A; Negri, C; Yen, C J; Gavrilova, O; Rumberger, J M; Durcan, M J; Yarnall, D P; Hawkins, A L; Griffin, C A; Burns, D K; Roth, J; Reitman, M; Shuldiner, A R

1997-04-28

We determined the chromosomal localization and partial genomic structure of the coding region of the human PPAR gamma gene (hPPAR gamma), a nuclear receptor important for adipocyte differentiation and function. Sequence analysis and long PCR of human genomic DNA with primers that span putative introns revealed that intron positions and sizes of hPPAR gamma are similar to those previously determined for the mouse PPAR gamma gene[13]. Fluorescent in situ hybridization localized hPPAR gamma to chromosome 3, band 3p25. Radiation hybrid mapping with two independent primer pairs was consistent with hPPAR gamma being within 1.5 Mb of marker D3S1263 on 3p25-p24.2. These sequences of the intron/exon junctions of the 6 coding exons shared by hPPAR gamma 1 and hPPAR gamma 2 will facilitate screening for possible mutations. Furthermore, D3S1263 is a suitable polymorphic marker for linkage analysis to evaluate PPAR gamma's potential contribution to genetic susceptibility to obesity, lipoatrophy, insulin resistance, and diabetes.
Molecular characterization and physical localization of highly repetitive DNA sequences from Brazilian Alstroemeria species.

PubMed

Kuipers, A G J; Kamstra, S A; de Jeu, M J; Visser, R G F

2002-01-01

Highly repetitive DNA sequences were isolated from genomic DNA libraries of Alstroemeria psittacina and A. inodora. Among the repetitive sequences that were isolated, tandem repeats as well as dispersed repeats could be discerned. The tandem repeats belonged to a family of interlinked Sau3A subfragments with sizes varying from 68-127 bp, and constituted a larger HinfI repeat of approximately 400 bp. Southern hybridization showed a similar molecular organization of the tandem repeats in each of the Brazilian Alstroemeria species tested. None of the repeats hybridized with DNA from Chilean Alstroemeria species, which indicates that they are specific for the Brazilian species. In-situ localization studies revealed the tandem repeats to be localized in clusters on the chromosomes of A. inodora and A. psittacina: distal hybridization sites were found on chromosome arms 2PS, 6PL, 7PS, 7PL and 8PL, interstitial sites on chromosome arms 2PL, 3PL, 4PL and 5PL. The applicability of the tandem repeats for cytogenetic analysis of interspecific hybrids and their role in heterochromatin organization are discussed.
myPhyloDB: a local web server for the storage and analysis of metagenomics data

USDA-ARS?s Scientific Manuscript database

myPhyloDB is a user-friendly personal database with a browser-interface designed to facilitate the storage, processing, analysis, and distribution of metagenomics data. MyPhyloDB archives raw sequencing files, and allows for easy selection of project(s)/sample(s) of any combination from all availab...
Ebbie: automated analysis and storage of small RNA cloning data using a dynamic web server

PubMed Central

Ebhardt, H Alexander; Wiese, Kay C; Unrau, Peter J

2006-01-01

Background DNA sequencing is used ubiquitously: from deciphering genomes[1] to determining the primary sequence of small RNAs (smRNAs) [2-5]. The cloning of smRNAs is currently the most conventional method to determine the actual sequence of these important regulators of gene expression. Typical smRNA cloning projects involve the sequencing of hundreds to thousands of smRNA clones that are delimited at their 5' and 3' ends by fixed sequence regions. These primers result from the biochemical protocol used to isolate and convert the smRNA into clonable PCR products. Recently we completed a smRNA cloning project involving tobacco plants, where analysis was required for ~700 smRNA sequences[6]. Finding no easily accessible research tool to enter and analyze smRNA sequences we developed Ebbie to assist us with our study. Results Ebbie is a semi-automated smRNA cloning data processing algorithm, which initially searches for any substring within a DNA sequencing text file, which is flanked by two constant strings. The substring, also termed smRNA or insert, is stored in a MySQL and BlastN database. These inserts are then compared using BlastN to locally installed databases allowing the rapid comparison of the insert to both the growing smRNA database and to other static sequence databases. Our laboratory used Ebbie to analyze scores of DNA sequencing data originating from an smRNA cloning project[6]. Through its built-in instant analysis of all inserts using BlastN, we were able to quickly identify 33 groups of smRNAs from ~700 database entries. This clustering allowed the easy identification of novel and highly expressed clusters of smRNAs. Ebbie is available under GNU GPL and currently implemented on Conclusion Ebbie was designed for medium sized smRNA cloning projects with about 1,000 database entries [6-8].Ebbie can be used for any type of sequence analysis where two constant primer regions flank a sequence of interest. The reliable storage of inserts, and their annotation in a MySQL database, BlastN[9] comparison of new inserts to dynamic and static databases make it a powerful new tool in any laboratory using DNA sequencing. Ebbie also prevents manual mistakes during the excision process and speeds up annotation and data-entry. Once the server is installed locally, its access can be restricted to protect sensitive new DNA sequencing data. Ebbie was primarily designed for smRNA cloning projects, but can be applied to a variety of RNA and DNA cloning projects[2,3,10,11]. PMID:16584563
Molecular characterization of a phloem-specific gene encoding the filament protein, phloem protein 1 (PP1), from Cucurbita maxima.

PubMed

Clark, A M; Jacobsen, K R; Bostwick, D E; Dannenhoffer, J M; Skaggs, M I; Thompson, G A

1997-07-01

Sieve elements in the phloem of most angiosperms contain proteinaceous filaments and aggregates called P-protein. In the genus Cucurbita, these filaments are composed of two major proteins: PP1, the phloem filament protein, and PP2, the phloem lactin. The gene encoding the phloem filament protein in pumpkin (Cucurbita maxima Duch.) has been isolated and characterized. Nucleotide sequence analysis of the reconstructed gene gPP1 revealed a continuous 2430 bp protein coding sequence, with no introns, encoding an 809 amino acid polypeptide. The deduced polypeptide had characteristics of PP1 and contained a 15 amino acid sequence determined by N-terminal peptide sequence analysis of PP1. The sequence of PP1 was highly repetitive with four 200 amino acid sequence domains containing structural motifs in common with cysteine proteinase inhibitors. Expression of the PP1 gene was detected in roots, hypocotyls, cotyledons, stems, and leaves of pumpkin plants. PP1 and its mRNA accumulated in pumpkin hypocotyls during the period of rapid hypocotyl elongation after which mRNA levels declined, while protein levels remained elevated. PP1 was immunolocalized in slime plugs and P-protein bodies in sieve elements of the phloem. Occasionally, PP1 was detected in companion cells. PP1 mRNA was localized by in situ hybridization in companion cells at early stages of vascular differentiation. The developmental accumulation and localization of PP1 and its mRNA paralleled the phloem lactin, further suggesting an interaction between these phloem-specific proteins.
Dissection of a nuclear localization signal.

PubMed

Hodel, M R; Corbett, A H; Hodel, A E

2001-01-12

The regulated process of protein import into the nucleus of a eukaryotic cell is mediated by specific nuclear localization signals (NLSs) that are recognized by protein import receptors. This study seeks to decipher the energetic details of NLS recognition by the receptor importin alpha through quantitative analysis of variant NLSs. The relative importance of each residue in two monopartite NLS sequences was determined using an alanine scanning approach. These measurements yield an energetic definition of a monopartite NLS sequence where a required lysine residue is followed by two other basic residues in the sequence K(K/R)X(K/R). In addition, the energetic contributions of the second basic cluster in a bipartite NLS ( approximately 3 kcal/mol) as well as the energy of inhibition of the importin alpha importin beta-binding domain ( approximately 3 kcal/mol) were also measured. These data allow the generation of an energetic scale of nuclear localization sequences based on a peptide's affinity for the importin alpha-importin beta complex. On this scale, a functional NLS has a binding constant of approximately 10 nm, whereas a nonfunctional NLS has a 100-fold weaker affinity of 1 microm. Further correlation between the current in vitro data and in vivo function will provide the foundation for a comprehensive quantitative model of protein import.
Prevalence and Identity of Taenia multiceps cysts "Coenurus cerebralis" in Sheep in Egypt.

PubMed

Amer, Said; ElKhatam, Ahmed; Fukuda, Yasuhiro; Bakr, Lamia I; Zidan, Shereif; Elsify, Ahmed; Mohamed, Mostafa A; Tada, Chika; Nakai, Yutaka

2017-12-01

Coenurosis is a parasitic disease caused by the larval stage (Coenurus cerebralis) of the canids cestode Taenia multiceps. C. cerebralis particularly infects sheep and goats, and pose a public health concerns. The present study aimed to determine the occurrence and molecular identity of C. cerebralis infecting sheep in Egypt. Infection rate was determined by postmortem inspection of heads of the cases that showed neurological manifestations. Species identification and genetic diversity were analyzed based on PCR-sequence analysis of nuclear ITS1 and mitochondrial cytochrome oxidase (COI) and nicotinamide adenine dinucleotide dehydrogenase (ND1) gene markers. Out of 3668 animals distributed in 50 herds at localities of Ashmoun and El Sadat cities, El Menoufia Province, Egypt, 420 (11.45%) sheep showed neurological disorders. Postmortem examination of these animals after slaughter at local abattoirs indicated to occurrence of C. cerebralis cysts in the brain of 111 out of 420 (26.4%), with overall infection rate 3.03% of the involved sheep population. Molecular analysis of representative samples of coenuri at ITS1 gene marker showed extensive intra- and inter-sequence diversity due to deletions/insertions in the microsatellite regions. On contrast to the nuclear gene marker, considerably low genetic diversity was seen in the analyzed mitochondrial gene markers. Phylogenetic analysis based on COI and ND1 gene sequences indicated that the generated sequences in the present study and the reference sequences in the database clustered in 4 haplogroups, with more or less similar topologies. Clustering pattern of the phylogenetic tree showed no effect for the geographic location or the host species. Copyright © 2017 Elsevier B.V. All rights reserved.
SPAR: small RNA-seq portal for analysis of sequencing experiments.

PubMed

Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee

2018-05-04

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.
Sequential associative memory with nonuniformity of the layer sizes.

PubMed

Teramae, Jun-Nosuke; Fukai, Tomoki

2007-01-01

Sequence retrieval has a fundamental importance in information processing by the brain, and has extensively been studied in neural network models. Most of the previous sequential associative memory embedded sequences of memory patterns have nearly equal sizes. It was recently shown that local cortical networks display many diverse yet repeatable precise temporal sequences of neuronal activities, termed "neuronal avalanches." Interestingly, these avalanches displayed size and lifetime distributions that obey power laws. Inspired by these experimental findings, here we consider an associative memory model of binary neurons that stores sequences of memory patterns with highly variable sizes. Our analysis includes the case where the statistics of these size variations obey the above-mentioned power laws. We study the retrieval dynamics of such memory systems by analytically deriving the equations that govern the time evolution of macroscopic order parameters. We calculate the critical sequence length beyond which the network cannot retrieve memory sequences correctly. As an application of the analysis, we show how the present variability in sequential memory patterns degrades the power-law lifetime distribution of retrieved neural activities.
Markov models of genome segmentation

NASA Astrophysics Data System (ADS)

Thakur, Vivek; Azad, Rajeev K.; Ramaswamy, Ram

2007-01-01

We introduce Markov models for segmentation of symbolic sequences, extending a segmentation procedure based on the Jensen-Shannon divergence that has been introduced earlier. Higher-order Markov models are more sensitive to the details of local patterns and in application to genome analysis, this makes it possible to segment a sequence at positions that are biologically meaningful. We show the advantage of higher-order Markov-model-based segmentation procedures in detecting compositional inhomogeneity in chimeric DNA sequences constructed from genomes of diverse species, and in application to the E. coli K12 genome, boundaries of genomic islands, cryptic prophages, and horizontally acquired regions are accurately identified.

Spectral Analysis of CLU Galaxies

NASA Astrophysics Data System (ADS)

Sutter, Jessica; Cook, David O.; Kasliwal, Mansi M.; Dale, Daniel A.

2017-01-01

In order to help select possible EM signals from gravitational wave-emitting sources, a more complete catalog of local galaxies is being created. This catalog, called the Census of the Local Universe (CLU), will attempt to find the position of all star-forming galaxies within 200 Mpc. By doing this, the area on the sky from which a gravitational wave could possibly have originated is reduced by a factor of 100. Besides providing this valuable resource for gravitational wave follow-up, the CLU survey provides an exciting new opportunity for better understanding the properties of galaxies near the same age as the Milky Way. Using spectra obtained with the Palomar 200-inch double-prime spectrograph as well as data from the WISE survey, we have created a main sequence for the CLU survey. By analyzing how this main sequence behaves in local galaxies, we can better understand the relationship between current star formation rate and total galaxy stellar mass.
Comparative analysis of long non-coding RNAs in Atlantic and Coho salmon reveals divergent transcriptome responses associated with immunity and tissue repair during sea lice infestation.

PubMed

Valenzuela-Muñoz, Valentina; Valenzuela-Miranda, Diego; Gallardo-Escárate, Cristian

2018-05-24

The increasing capacity of transcriptomic analysis by high throughput sequencing has highlighted the presence of a large proportion of transcripts that do not encode proteins. In particular, long non-coding RNAs (lncRNAs) are sequences with low coding potential and conservation among species. Moreover, cumulative evidence has revealed important roles in post-transcriptional gene modulation in several taxa. In fish, the role of lncRNAs has been scarcely studied and even less so during the immune response against sea lice. In the present study we mined for lncRNAs in Atlantic salmon (Salmo salar) and Coho salmon (Oncorhynkus kisutch), which are affected by the sea louse Caligus rogercresseyi, evaluating the degree of sequence conservation between these two fish species and their putative roles during the infection process. Herein, Atlantic and Coho salmon were infected with 35 lice/fish and evaluated after 7 and 14 days post-infestation (dpi). For RNA sequencing, samples from skin and head kidney were collected. A total of 5658/4140 and 3678/2123 lncRNAs were identified in uninfected/infected Atlantic and Coho salmon transcriptomes, respectively. Species-specific transcription patterns were observed in exclusive lncRNAs according to the tissue analyzed. Furthermore, neighbor gene GO enrichment analysis of the top 100 highly regulated lncRNAs in Atlantic salmon showed that lncRNAs were localized near genes related to the immune response. On the other hand, in Coho salmon the highly regulated lncRNAs were localized near genes involved in tissue repair processes. This study revealed high regulation of lncRNAs closely localized to immune and tissue repair-related genes in Atlantic and Coho salmon, respectively, suggesting putative roles for lncRNAs in salmon against sea lice infestation. Copyright © 2018 Elsevier Ltd. All rights reserved.
Evolutionarily conserved ELOVL4 gene expression in the vertebrate retina.

PubMed

Lagali, Pamela S; Liu, Jiafan; Ambasudhan, Rajesh; Kakuk, Laura E; Bernstein, Steven L; Seigel, Gail M; Wong, Paul W; Ayyagari, Radha

2003-07-01

The gene elongation of very long chain fatty acids-4 (ELOVL4) has been shown to underlie phenotypically heterogeneous forms of autosomal dominant macular degeneration. In this study, the extent of evolutionary conservation and the existence and localization of retinal expression of this gene was investigated across a wide variety of species. Southern blot analysis of genomic DNA and bioinformatic analysis using the human ELOVL4 cDNA and protein sequences, respectively, were performed to identify species in which ELOVL4 orthologues and/or homologues are present. Retinal RNA and protein extracts derived from different species were assessed by Northern hybridization and immunoblot techniques to assess evolutionary conservation of gene expression. Immunohistochemical analysis of tissue sections prepared from various mammalian retinas was performed to determine the distribution of ELOVL4 and homologous proteins within specific retinal cell layers. The existence of ELOVL4 sequence orthologues and homologues was confirmed by both Southern blot analysis and in silico searches of protein sequence databases. Phylogenetic analysis places ELOVL4 among a large family of known and putative fatty acid elongase proteins. Northern blot analysis revealed the presence of multiple transcripts corresponding to ELOVL4 homologues expressed in the retina of several different mammalian species. Conserved proteins were also detected among retinal extracts of different mammals and were found to localize predominantly to the photoreceptor cell layer within retinal tissue preparations. The ELOVL4 gene is highly conserved throughout evolution and is expressed in the photoreceptor cells of the retina in a variety of different species, which suggests that it plays a critical role in retinal cell biology.
Dipeptide frequency/bias analysis identifies conserved sites of nonrandomness shared by cysteine-rich motifs.

PubMed

Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N

2001-08-15

This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
Comparative sensitivities of functional MRI sequences in detection of local recurrence of prostate carcinoma after radical prostatectomy or external-beam radiotherapy.

PubMed

Roy, Catherine; Foudi, Fatah; Charton, Jeanne; Jung, Michel; Lang, Hervé; Saussine, Christian; Jacqmin, Didier

2013-04-01

The aim of this retrospective study was to determine the respective accuracies of three types of functional MRI sequences-diffusion-weighted imaging (DWI), dynamic contrast-enhanced (DCE) MRI, and 3D (1)H-MR spectroscopy (MRS)-in the depiction of local prostate cancer recurrence after two different initial therapy options. From a cohort of 83 patients with suspicion of local recurrence based on prostate-specific antigen (PSA) kinetics who were imaged on a 3-T MRI unit using an identical protocol including the three functional sequences with an endorectal coil, we selected 60 patients (group A, 28 patients who underwent radical prostatectomy; group B, 32 patients who underwent external-beam radiation) who had local recurrence ascertained on the basis of a transrectal ultrasound-guided biopsy results and a reduction in PSA level after salvage therapy. All patients presented with a local relapse. Sensitivity with T2-weighted MRI and 3D (1)H-MRS sequences was 57% and 53%, respectively, for group A and 71% and 78%, respectively, for group B. DCE-MRI alone showed a sensitivity of 100% and 96%, respectively, for groups A and B. DWI alone had a higher sensitivity for group B (96%) than for group A (71%). The combination of T2-weighted imaging plus DWI plus DCE-MRI provided a sensitivity as high as 100% in group B. The performance of functional imaging sequences for detecting recurrence is different after radical prostatectomy and external-beam radiotherapy. DCE-MRI is a valid and efficient tool to detect prostate cancer recurrence in radical prostatectomy as well as in external-beam radiotherapy. The combination of DCE-MRI and DWI is highly efficient after radiation therapy. Three-dimensional (1)H-MRS needs to be improved. Even though it is not accurate enough, T2-weighted imaging remains essential for the morphologic analysis of the area.
Local alignment of two-base encoded DNA sequence

PubMed Central

Homer, Nils; Merriman, Barry; Nelson, Stanley F

2009-01-01

Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

PubMed

Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

2017-04-27

The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
PNMA family: Protein interaction network and cell signalling pathways implicated in cancer and apoptosis.

PubMed

Pang, Siew Wai; Lahiri, Chandrajit; Poh, Chit Laa; Tan, Kuan Onn

2018-05-01

Paraneoplastic Ma Family (PNMA) comprises a growing number of family members which share relatively conserved protein sequences encoded by the human genome and is localized to several human chromosomes, including the X-chromosome. Based on sequence analysis, PNMA family members share sequence homology to the Gag protein of LTR retrotransposon, and several family members with aberrant protein expressions have been reported to be closely associated with the human Paraneoplastic Disorder (PND). In addition, gene mutations of specific members of PNMA family are known to be associated with human mental retardation or 3-M syndrome consisting of restrictive post-natal growth or dwarfism, and development of skeletal abnormalities. Other than sequence homology, the physiological function of many members in this family remains unclear. However, several members of this family have been characterized, including cell signalling events mediated by these proteins that are associated with apoptosis, and cancer in different cell types. Furthermore, while certain PNMA family members show restricted gene expression in the human brain and testis, other PNMA family members exhibit broader gene expression or preferential and selective protein interaction profiles, suggesting functional divergence within the family. Functional analysis of some members of this family have identified protein domains that are required for subcellular localization, protein-protein interactions, and cell signalling events which are the focus of this review paper. Copyright © 2018 Elsevier Inc. All rights reserved.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-11-28

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

PubMed

Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

2016-03-01

At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. The all local services have been deployed at our portal http://bioservices.sci.psu.ac.th.
Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization.

PubMed

Veidenberg, Andres; Medlar, Alan; Löytynoja, Ari

2016-04-01

Wasabi is an open source, web-based environment for evolutionary sequence analysis. Wasabi visualizes sequence data together with a phylogenetic tree within a modern, user-friendly interface: The interface hides extraneous options, supports context sensitive menus, drag-and-drop editing, and displays additional information, such as ancestral sequences, associated with specific tree nodes. The Wasabi environment supports reproducibility by automatically storing intermediate analysis steps and includes built-in functions to share data between users and publish analysis results. For computational analysis, Wasabi supports PRANK and PAGAN for phylogeny-aware alignment and alignment extension, and it can be easily extended with other tools. Along with drag-and-drop import of local files, Wasabi can access remote data through URL and import sequence data, GeneTrees and EPO alignments directly from Ensembl. To demonstrate a typical workflow using Wasabi, we reproduce key findings from recent comparative genomics studies, including a reanalysis of the EGLN1 gene from the tiger genome study: These case studies can be browsed within Wasabi at http://wasabiapp.org:8000?id=usecases. Wasabi runs inside a web browser and does not require any installation. One can start using it at http://wasabiapp.org. All source code is licensed under the AGPLv3. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Rapid evolutionary change of common bean (Phaseolus vulgaris L) plastome, and the genomic diversification of legume chloroplasts

PubMed Central

Guo, Xianwu; Castillo-Ramírez, Santiago; González, Víctor; Bustos, Patricia; Luís Fernández-Vázquez, José; Santamaría, Rosa Isela; Arellano, Jesús; Cevallos, Miguel A; Dávila, Guillermo

2007-01-01

Background Fabaceae (legumes) is one of the largest families of flowering plants, and some members are important crops. In contrast to what we know about their great diversity or economic importance, our knowledge at the genomic level of chloroplast genomes (cpDNAs or plastomes) for these crops is limited. Results We sequenced the complete genome of the common bean (Phaseolus vulgaris cv. Negro Jamapa) chloroplast. The plastome of P. vulgaris is a 150,285 bp circular molecule. It has gene content similar to that of other legume plastomes, but contains two pseudogenes, rpl33 and rps16. A distinct inversion occurred at the junction points of trnH-GUG/rpl14 and rps19/rps8, as in adzuki bean [1]. These two pseudogenes and the inversion were confirmed in 10 varieties representing the two domestication centers of the bean. Genomic comparative analysis indicated that inversions generally occur in legume plastomes and the magnitude and localization of insertions/deletions (indels) also vary. The analysis of repeat sequences demonstrated that patterns and sequences of tandem repeats had an important impact on sequence diversification between legume plastomes and tandem repeats did not belong to dispersed repeats. Interestingly, P. vulgaris plastome had higher evolutionary rates of change on both genomic and gene levels than G. max, which could be the consequence of pressure from both mutation and natural selection. Conclusion Legume chloroplast genomes are widely diversified in gene content, gene order, indel structure, abundance and localization of repetitive sequences, intracellular sequence exchange and evolutionary rates. The P. vulgaris plastome is a rapidly evolving genome. PMID:17623083
Evolutionary insight into the ionotropic glutamate receptor superfamily of photosynthetic organisms.

PubMed

De Bortoli, Sara; Teardo, Enrico; Szabò, Ildikò; Morosinotto, Tomas; Alboresi, Alessandro

2016-11-01

Photosynthetic eukaryotes have a complex evolutionary history shaped by multiple endosymbiosis events that required a tight coordination between the organelles and the rest of the cell. Plant ionotropic glutamate receptors (iGLRs) form a large superfamily of proteins with a predicted or proven non-selective cation channel activity regulated by a broad range of amino acids. They are involved in different physiological processes such as C/N sensing, resistance against fungal infection, root and pollen tube growth and response to wounding and pathogens. Most of the present knowledge is limited to iGLRs located in plasma membranes. However, recent studies localized different iGLR isoforms to mitochondria and/or chloroplasts, suggesting the possibility that they play a specific role in bioenergetic processes. In this work, we performed a comparative analysis of GLR sequences from bacteria and various photosynthetic eukaryotes. In particular, novel types of selectivity filters of bacteria are reported adding new examples of the great diversity of the GLR superfamily. The highest variability in GLR sequences was found among the algal sequences (cryptophytes, diatoms, brown and green algae). GLRs of land plants are not closely related to the GLRs of green algae analyzed in this work. The GLR family underwent a great expansion in vascular plants. Among plant GLRs, Clade III includes sequences from Physcomitrella patens, Marchantia polymorpha and gymnosperms and can be considered the most ancient, while other clades likely emerged later. In silico analysis allowed the identification of sequences with a putative target to organelles. Sequences with a predicted localization to mitochondria and chloroplasts are randomly distributed among different type of GLRs, suggesting that no compartment-related specific function has been maintained across the species. Copyright © 2016 Elsevier B.V. All rights reserved.
Localization of migraine susceptibility genes in human brain by single-cell RNA sequencing.

PubMed

Renthal, William

2018-01-01

Background Migraine is a debilitating disorder characterized by severe headaches and associated neurological symptoms. A key challenge to understanding migraine has been the cellular complexity of the human brain and the multiple cell types implicated in its pathophysiology. The present study leverages recent advances in single-cell transcriptomics to localize the specific human brain cell types in which putative migraine susceptibility genes are expressed. Methods The cell-type specific expression of both familial and common migraine-associated genes was determined bioinformatically using data from 2,039 individual human brain cells across two published single-cell RNA sequencing datasets. Enrichment of migraine-associated genes was determined for each brain cell type. Results Analysis of single-brain cell RNA sequencing data from five major subtypes of cells in the human cortex (neurons, oligodendrocytes, astrocytes, microglia, and endothelial cells) indicates that over 40% of known migraine-associated genes are enriched in the expression profiles of a specific brain cell type. Further analysis of neuronal migraine-associated genes demonstrated that approximately 70% were significantly enriched in inhibitory neurons and 30% in excitatory neurons. Conclusions This study takes the next step in understanding the human brain cell types in which putative migraine susceptibility genes are expressed. Both familial and common migraine may arise from dysfunction of discrete cell types within the neurovascular unit, and localization of the affected cell type(s) in an individual patient may provide insight into to their susceptibility to migraine.
Widespread signatures of local mRNA folding structure selection in four Dengue virus serotypes

PubMed Central

2015-01-01

Background It is known that mRNA folding can affect and regulate various gene expression steps both in living organisms and in viruses. Previous studies have recognized functional RNA structures in the genome of the Dengue virus. However, these studies usually focused either on the viral untranslated regions or on very specific and limited regions at the beginning of the coding sequences, in a limited number of strains, and without considering evolutionary selection. Results Here we performed the first large scale comprehensive genomics analysis of selection for local mRNA folding strength in the Dengue virus coding sequences, based on a total of 1,670 genomes and 4 serotypes. Our analysis identified clusters of positions along the coding regions that may undergo a conserved evolutionary selection for strong or weak local folding maintained across different viral variants. Specifically, 53-66 clusters for strong folding and 49-73 clusters for weak folding (depending on serotype) aggregated of positions with a significant conservation of folding energy signals (related to partially overlapping local genomic regions) were recognized. In addition, up to 7% of these positions were found to be conserved in more than 90% of the viral genomes. Although some of the identified positions undergo frequent synonymous / non-synonymous substitutions, the selection for folding strength therein is preserved, and thus cannot be trivially explained based on sequence conservation alone. Conclusions The fact that many of the positions with significant folding related signals are conserved among different Dengue variants suggests that a better understanding of the mRNA structures in the corresponding regions may promote the development of prospective anti- Dengue vaccination strategies. The comparative genomics approach described here can be employed in the future for detecting functional regions in other pathogens with very high mutations rates. PMID:26449467
A mechanistic insight into the amyloidogenic structure of hIAPP peptide revealed from sequence analysis and molecular dynamics simulation.

PubMed

Chakraborty, Sandipan; Chatterjee, Barnali; Basu, Soumalee

2012-07-01

A collective approach of sequence analysis, phylogenetic tree and in silico prediction of amyloidogenecity using bioinformatics tools have been used to correlate the observed species-specific variations in IAPP sequences with the amyloid forming propensity. Observed substitution patterns indicate that probable changes in local hydrophobicity are instrumental in altering the aggregation propensity of the peptide. In particular, residues at 17th, 22nd and 23rd positions of the IAPP peptide are found to be crucial for amyloid formation. Proline25 primarily dictates the observed non-amyloidogenecity in rodents. Furthermore, extensive molecular dynamics simulation of 0.24 μs have been carried out with human IAPP (hIAPP) fragment 19-27, the portion showing maximum sequence variation across different species, to understand the native folding characteristic of this region. Principal component analysis in combination with free energy landscape analysis illustrates a four residue turn spanning from residue 22 to 25. The results provide a structural insight into the intramolecular β-sheet structure of amylin which probably is the template for nucleation of fibril formation and growth, a pathogenic feature of type II diabetes. Copyright © 2012 Elsevier B.V. All rights reserved.
A Bayesian Framework for Human Body Pose Tracking from Depth Image Sequences

PubMed Central

Zhu, Youding; Fujimura, Kikuo

2010-01-01

This paper addresses the problem of accurate and robust tracking of 3D human body pose from depth image sequences. Recovering the large number of degrees of freedom in human body movements from a depth image sequence is challenging due to the need to resolve the depth ambiguity caused by self-occlusions and the difficulty to recover from tracking failure. Human body poses could be estimated through model fitting using dense correspondences between depth data and an articulated human model (local optimization method). Although it usually achieves a high accuracy due to dense correspondences, it may fail to recover from tracking failure. Alternately, human pose may be reconstructed by detecting and tracking human body anatomical landmarks (key-points) based on low-level depth image analysis. While this method (key-point based method) is robust and recovers from tracking failure, its pose estimation accuracy depends solely on image-based localization accuracy of key-points. To address these limitations, we present a flexible Bayesian framework for integrating pose estimation results obtained by methods based on key-points and local optimization. Experimental results are shown and performance comparison is presented to demonstrate the effectiveness of the proposed approach. PMID:22399933
A combined de novo protein sequencing and cDNA library approach to the venomic analysis of Chinese spider Araneus ventricosus.

PubMed

Duan, Zhigui; Cao, Rui; Jiang, Liping; Liang, Songping

2013-01-14

In past years, spider venoms have attracted increasing attention due to their extraordinary chemical and pharmacological diversity. The recently popularized proteomic method highly improved our ability to analyze the proteins in the venom. However, the lack of information about isolated venom proteins sequences dramatically limits the ability to confidently identify venom proteins. In the present paper, the venom from Araneus ventricosus was analyzed using two complementary approaches: 2-DE/Shotgun-LC-MS/MS coupled to MASCOT search and 2-DE/Shotgun-LC-MS/MS coupled to manual de novo sequencing followed by local venom protein database (LVPD) search. The LVPD was constructed with toxin-like protein sequences obtained from the analysis of cDNA library from A. ventricosus venom glands. Our results indicate that a total of 130 toxin-like protein sequences were unambiguously identified by manual de novo sequencing coupled to LVPD search, accounting for 86.67% of all toxin-like proteins in LVPD. Thus manual de novo sequencing coupled to LVPD search was proved an extremely effective approach for the analysis of venom proteins. In addition, the approach displays impeccable advantage in validating mutant positions of isoforms from the same toxin-like family. Intriguingly, methyl esterifcation of glutamic acid was discovered for the first time in animal venom proteins by manual de novo sequencing. Crown Copyright © 2012. Published by Elsevier B.V. All rights reserved.
Nuclear localization and transactivation by Vitis CBF transcription factors are regulated by combinations of conserved amino acid domains.

PubMed

Carlow, Chevonne E; Faultless, J Trent; Lee, Christine; Siddiqua, Mahbuba; Edge, Alison; Nassuth, Annette

2017-09-01

The highly conserved CBF pathway is crucial in the regulation of plant responses to low temperatures. Extensive analysis of Arabidopsis CBF proteins revealed that their functions rely on several conserved amino acid domains although the exact function of each domain is disputed. The question was what functions similar domains have in CBFs from other, overwintering woody plants such as Vitis, which likely have a more involved regulation than the model plant Arabidopsis. A total of seven CBF genes were cloned and sequenced from V. riparia and the less frost tolerant V. vinifera. The deduced species-specific amino acid sequences differ in only a few amino acids, mostly in non-conserved regions. Amino acid sequence comparison and phylogenetic analysis showed two distinct groups of Vitis CBFs. One group contains CBF1, CBF2, CBF3 and CBF8 and the other group contains CBF4, CBF5 and CBF6. Transient transactivation assays showed that all Vitis CBFs except CBF5 activate via a CRT or DRE promoter element, whereby Vitis CBF3 and 4 prefer a CRT element. The hydrophobic domains in the C-terminal end of VrCBF6 were shown to be important for how well it activates. The putative nuclear localization domain of Vitis CBF1 was shown to be sufficient for nuclear localization, in contrast to previous reports for AtCBF1, and also important for transactivation. The latter highlights the value of careful analysis of domain functions instead of reliance on computer predictions and published data for other related proteins. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Identification and functional characterization of effectors in expressed sequence tags from various life cycle stages of the potato cyst nematode Globodera pallida.

PubMed

Jones, John T; Kumar, Amar; Pylypenko, Liliya A; Thirugnanasambandam, Amarnath; Castelli, Lydia; Chapman, Sean; Cock, Peter J A; Grenier, Eric; Lilley, Catherine J; Phillips, Mark S; Blok, Vivian C

2009-11-01

In this article, we describe the analysis of over 9000 expressed sequence tags (ESTs) from cDNA libraries obtained from various life cycle stages of Globodera pallida. We have identified over 50 G. pallida effectors from this dataset using bioinformatics analysis, by screening clones in order to identify secreted proteins up-regulated after the onset of parasitism and using in situ hybridization to confirm the expression in pharyngeal gland cells. A substantial gene family encoding G. pallida SPRYSEC proteins has been identified. The expression of these genes is restricted to the dorsal pharyngeal gland cell. Different members of the SPRYSEC family of proteins from G. pallida show different subcellular localization patterns in plants, with some localized to the cytoplasm and others to the nucleus and nucleolus. Differences in subcellular localization may reflect diverse functional roles for each individual protein or, more likely, variety in the compartmentalization of plant proteins targeted by the nematode. Our data are therefore consistent with the suggestion that the SPRYSEC proteins suppress host defences, as suggested previously, and that they achieve this through interaction with a range of host targets.

A cataract-causing connexin 50 mutant is mislocalized to the ER due to loss of the fourth transmembrane domain and cytoplasmic domain.

PubMed

Somaraju Chalasani, Madhavi Latha; Muppirala, Madhavi; G Ponnam, Surya Prakash; Kannabiran, Chitra; Swarup, Ghanshyam

2013-01-01

Mutations in the eye lens gap junction protein connexin 50 cause cataract. Earlier we identified a frameshift mutant of connexin 50 (c.670insA; p.Thr203AsnfsX47) in a family with autosomal recessive cataract. The mutant protein is smaller and contains 46 aberrant amino acids at the C-terminus after amino acid 202. Here, we have analysed this frameshift mutant and observed that it localized to the endoplasmic reticulum (ER) but not in the plasma membrane. Moreover, overexpression of the mutant resulted in disintegration of the ER-Golgi intermediate compartment (ERGIC), reduction in the level of ERGIC-53 protein and breakdown of the Golgi in many cells. Overexpression of the frameshift mutant partially inhibited the transport of wild type connexin 50 to the plasma membrane. A deletion mutant lacking the aberrant sequence showed predominant localization in the ER and inhibited anterograde protein transport suggesting, therefore, that the aberrant sequence is not responsible for improper localization of the frameshift mutant. Further deletion analysis showed that the fourth transmembrane domain and a membrane proximal region (231-294 amino acids) of the cytoplasmic domain are needed for transport from the ER and localization to the plasma membrane. Our results show that a frameshift mutant of connexin 50 mislocalizes to the ER and causes disintegration of the ERGIC and Golgi. We have also identified a sequence of connexin 50 crucial for transport from the ER and localization to the plasma membrane.
Sequencing, bioinformatic characterization and expression pattern of a putative amino acid transporter from the parasitic cestode Echinococcus granulosus.

PubMed

Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C

2008-03-31

We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.
Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.

PubMed

Pan, Xiaoyong; Shen, Hong-Bin

2018-05-02

RNA-binding proteins (RBPs) take over 5∼10% of the eukaryotic proteome and play key roles in many biological processes, e.g. gene regulation. Experimental detection of RBP binding sites is still time-intensive and high-costly. Instead, computational prediction of the RBP binding sites using pattern learned from existing annotation knowledge is a fast approach. From the biological point of view, the local structure context derived from local sequences will be recognized by specific RBPs. However, in computational modeling using deep learning, to our best knowledge, only global representations of entire RNA sequences are employed. So far, the local sequence information is ignored in the deep model construction process. In this study, we present a computational method iDeepE to predict RNA-protein binding sites from RNA sequences by combining global and local convolutional neural networks (CNNs). For the global CNN, we pad the RNA sequences into the same length. For the local CNN, we split a RNA sequence into multiple overlapping fixed-length subsequences, where each subsequence is a signal channel of the whole sequence. Next, we train deep CNNs for multiple subsequences and the padded sequences to learn high-level features, respectively. Finally, the outputs from local and global CNNs are combined to improve the prediction. iDeepE demonstrates a better performance over state-of-the-art methods on two large-scale datasets derived from CLIP-seq. We also find that the local CNN run 1.8 times faster than the global CNN with comparable performance when using GPUs. Our results show that iDeepE has captured experimentally verified binding motifs. https://github.com/xypan1232/iDeepE. xypan172436@gmail.com or hbshen@sjtu.edu.cn. Supplementary data are available at Bioinformatics online.
Meteor tracking via local pattern clustering in spatio-temporal domain

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Švihlík, Jan; Fliegel, Karel

2016-09-01

Reliable meteor detection is one of the crucial disciplines in astronomy. A variety of imaging systems is used for meteor path reconstruction. The traditional approach is based on analysis of 2D image sequences obtained from a double station video observation system. Precise localization of meteor path is difficult due to atmospheric turbulence and other factors causing spatio-temporal fluctuations of the image background. The proposed technique performs non-linear preprocessing of image intensity using Box-Cox transform as recommended in our previous work. Both symmetric and asymmetric spatio-temporal differences are designed to be robust in the statistical sense. Resulting local patterns are processed by data whitening technique and obtained vectors are classified via cluster analysis and Self-Organized Map (SOM).
Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

PubMed

Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

2003-04-02

Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Genetic analysis of tumorigenesis: XXXII. Localization of constitutionally amplified KRAS sequences to Chinese hamster chromosomes X and Y by in situ hybridization.

PubMed

Stenman, G; Anisowicz, A; Sager, R

1988-11-01

The KRAS gene is constitutionally amplified in the Chinese hamster. We have mapped the amplified sequences by in situ hybridization to two major sites on the X and Y chromosomes, Xq4 and Yp2. No autosomal site was detected despite a search under relaxed hybridization conditions. KRAS DNA is amplified about 50-fold compared to a human cell line known to have a diploid number of KRAS sequences, whereas mRNA expression is 5- to 10-fold lower than in normal human cells. While mRNA expression levels do not necessarily parallel gene copy number, the low expression level strongly suggests that the amplified sequences are transcriptionally silent. It is suggested that the amplified sequences arose from the original KRAS gene on chromosome 8 and that the KRAS sequences on the Y chromosome arose by X-Y recombination.
Determining mutation density using Restriction Enzyme Sequence Comparative Analysis (RESCAN)

USDA-ARS?s Scientific Manuscript database

The average mutation density of a mutant population is a major consideration when developing resources for the efficient, cost-effective implementation of reverse genetics methods such as Targeting of Induced Local Lesions in Genomes (TILLING). Reliable estimates of mutation density can be achieved ...
SAMSA2: a standalone metatranscriptome analysis pipeline.

PubMed

Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G

2018-05-21

Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Genomic organization, complete sequence, and chromosomal location of the gene for human eotaxin (SCYA11), an eosinophil-specific CC chemokine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia-Zepeda, E.A.; Sarafi, M.N.; Luster, A.D.

1997-05-01

Eotaxin is a CC chemokine that is a specific chemoattractant for eosinophils and is implicated in the pathogenesis of eosinophilic inflammatory diseases, such as asthma. We describe the genomic organization, complete sequence, including 1354 bp 5{prime} of the RNA initiation site, and chromosomal localization of the human eotaxin gene. Fluorescence in situ hybridization analysis localized eotaxin to human chromosome 17, in the region q21.1-q21.2, and the human gene name SCYA11 was assigned. We also present the 5{prime} flanking sequence of the mouse eotaxin gene and have identified several regulatory elements that are conserved between the murine and the human promoters.more » In particular, the presence of elements such as NF-{Kappa}B, interferon-{gamma} response element, and glucocorticoid response element may explain the observed regulation of the eotaxin gene by cytokines and glucocorticoids. 17 refs., 4 figs., 1 tab.« less
Predicting residue-wise contact orders in proteins by support vector regression.

PubMed

Song, Jiangning; Burrage, Kevin

2006-10-03

The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Aquatic environmental DNA detects seasonal fish abundance and habitat preference in an urban estuary

PubMed Central

Soboleva, Lyubov; Charlop-Powers, Zachary

2017-01-01

The difficulty of censusing marine animal populations hampers effective ocean management. Analyzing water for DNA traces shed by organisms may aid assessment. Here we tested aquatic environmental DNA (eDNA) as an indicator of fish presence in the lower Hudson River estuary. A checklist of local marine fish and their relative abundance was prepared by compiling 12 traditional surveys conducted between 1988–2015. To improve eDNA identification success, 31 specimens representing 18 marine fish species were sequenced for two mitochondrial gene regions, boosting coverage of the 12S eDNA target sequence to 80% of local taxa. We collected 76 one-liter shoreline surface water samples at two contrasting estuary locations over six months beginning in January 2016. eDNA was amplified with vertebrate-specific 12S primers. Bioinformatic analysis of amplified DNA, using a reference library of GenBank and our newly generated 12S sequences, detected most (81%) locally abundant or common species and relatively few (23%) uncommon taxa, and corresponded to seasonal presence and habitat preference as determined by traditional surveys. Approximately 2% of fish reads were commonly consumed species that are rare or absent in local waters, consistent with wastewater input. Freshwater species were rarely detected despite Hudson River inflow. These results support further exploration and suggest eDNA will facilitate fine-scale geographic and temporal mapping of marine fish populations at relatively low cost. PMID:28403183
Molecular analysis of a 11 700-year-old rodent midden from the Atacama Desert, Chile

USGS Publications Warehouse

Kuch, M.; Rohland, N.; Betancourt, J.L.; Latorre, C.; Steppan, S.; Poinar, H.N.

2002-01-01

DNA was extracted from an 11 700-year-old rodent midden from the Atacama Desert, Chile and the chloroplast and animal mitochondrial DNA (mtDNA) gene sequences were analysed to investigate the floral environment surrounding the midden, and the identity of the midden agent. The plant sequences, together with the macroscopic identifications, suggest the presence of 13 plant families and three orders that no longer exist today at the midden locality, and thus point to a much more diverse and humid climate 11 700 years ago. The mtDNA sequences suggest the presence of at least four different vertebrates, which have been putatively identified as a camelid (vicuna), two rodents (Phyllotis and Abrocoma), and a cardinal bird (Passeriformes). To identify the midden agent, DNA was extracted from pooled faecal pellets, three small overlapping fragments of the mitochondrial cytochrome b gene were amplified and multiple clones were sequenced. These results were analysed along with complete cytochrome b sequences for several modern Phyllotis species to place the midden sequence phylogenetically. The results identified the midden agent as belonging to an ancestral P. limatus. Today, P. limatus is not found at the midden locality but it can be found 100 km to the north, indicating at least a small range shift. The more extensive sampling of modern Phyllotis reinforces the suggestion that P. limatus is recently derived from a peripheral isolate.
Sequence analysis of Leukemia DNA

NASA Astrophysics Data System (ADS)

Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

2018-03-01

Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.
Isolation and bioinformatics analysis of differentially methylated genomic fragments in human gastric cancer

PubMed Central

Liao, Ai-Jun; Su, Qi; Wang, Xun; Zeng, Bin; Shi, Wei

2008-01-01

AIM: To isolate and analyze the DNA sequences which are methylated differentially between gastric cancer and normal gastric mucosa. METHODS: The differentially methylated DNA sequences between gastric cancer and normal gastric mucosa were isolated by methylation-sensitive representational difference analysis (MS-RDA). Similarities between the separated fragments and the human genomic DNA were analyzed with Basic Local Alignment Search Tool (BLAST). RESULTS: Three differentially methylated DNA sequences were obtained, two of which have been accepted by GenBank. The accession numbers are AY887106 and AY887107. AY887107 was highly similar to the 11th exon of LOC440683 (98%), 3’ end of LOC440887 (99%), and promoter and exon regions of DRD5 (94%). AY887106 was consistent (98%) with a CpG island in ribosomal RNA isolated from colorectal cancer by Minoru Toyota in 1999. CONCLUSION: The methylation degree is different between gastric cancer and normal gastric mucosa. The differentially methylated DNA sequences can be isolated effectively by MS-RDA. PMID:18322944
Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

PubMed

Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

2005-01-01

In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Local linear regression for function learning: an analysis based on sample discrepancy.

PubMed

Cervellera, Cristiano; Macciò, Danilo

2014-11-01

Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2.

PubMed

Gotter, Anthony L; Shaikh, Tamim H; Budarf, Marcia L; Rhodes, C Harker; Emanuel, Beverly S

2004-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem-loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem-loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure.
A palindrome-mediated mechanism distinguishes translocations involving LCR-B of chromosome 22q11.2

PubMed Central

Gotter, Anthony L.; Shaikh, Tamim H.; Budarf, Marcia L.; Rhodes, C. Harker; Emanuel, Beverly S.

2010-01-01

Two known recurrent constitutional translocations, t(11;22) and t(17;22), as well as a non-recurrent t(4;22), display derivative chromosomes that have joined to a common site within the low copy repeat B (LCR-B) region of 22q11.2. This breakpoint is located between two AT-rich inverted repeats that form a nearly perfect palindrome. Breakpoints within the 11q23, 17q11 and 4q35 partner chromosomes also fall near the center of palindromic sequences. In the present work the breakpoints of a fourth translocation involving LCR-B, a balanced ependymoma-associated t(1;22), were characterized not only to localize this junction relative to known genes, but also to further understand the mechanism underlying these rearrangements. FISH mapping was used to localize the 22q11.2 breakpoint to LCR-B and the 1p21 breakpoint to single BAC clones. STS mapping narrowed the 1p21.2 breakpoint to a 1990 bp AT-rich region, and junction fragments were amplified by nested PCR. Junction fragment-derived sequence indicates that the 1p21.2 breakpoint splits a 278 nt palindrome capable of forming stem–loop secondary structure. In contrast, the 1p21.2 reference genomic sequence from clones in the database does not exhibit this configuration, suggesting a predisposition for regional genomic instability perhaps etiologic for this rearrangement. Given its similarity to known chromosomal fragile site (FRA) sequences, this polymorphic 1p21.2 sequence may represent one of the FRA1 loci. Comparative analysis of the secondary structure of sequences surrounding translocation breakpoints that involve LCR-B with those not involving this region indicate a unique ability of the former to form stem–loop structures. The relative likelihood of forming these configurations appears to be related to the rate of translocation occurrence. Further analysis suggests that constitutional translocations in general occur between sequences of similar melting temperature and propensity for secondary structure. PMID:14613967
Initial steps towards a production platform for DNA sequence analysis on the grid.

PubMed

Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D

2010-12-14

Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/
Endosymbiotic Microbiota of the Bamboo Pseudococcid Antonina crawii (Insecta, Homoptera)

PubMed Central

Fukatsu, Takema; Nikoh, Naruo

2000-01-01

We characterized the intracellular symbiotic microbiota of the bamboo pseudococcid Antonina crawii by performing a molecular phylogenetic analysis in combination with in situ hybridization. Almost the entire length of the bacterial 16S rRNA gene was amplified and cloned from A. crawii whole DNA. Restriction fragment length polymorphism analysis revealed that the clones obtained included three distinct types of sequences. Nucleotide sequences of the three types were determined and subjected to a molecular phylogenetic analysis. The first sequence was a member of the γ subdivision of the division Proteobacteria (γ-Proteobacteria) to which no sequences in the database were closely related, although the sequences of endosymbionts of other homopterans, such as psyllids and aphids, were distantly related. The second sequence was a β-Proteobacteria sequence and formed a monophyletic group with the sequences of endosymbionts from other pseudococcids. The third sequence exhibited a high level of similarity to sequences of Spiroplasma spp. from ladybird beetles and a tick. Localization of the endosymbionts was determined by using tissue sections of A. crawii and in situ hybridization with specific oligonucleotide probes. The γ- and β-Proteobacteria symbionts were packed in the cytoplasm of the same mycetocytes (or bacteriocytes) and formed a large mycetome (or bacteriome) in the abdomen. The spiroplasma symbionts were also present intracellularly in various tissues at a low density. We observed that the anterior poles of developing eggs in the ovaries were infected by the γ- and β-Proteobacteria symbionts in a systematic way, which ensured vertical transmission. Five representative pseudococcids were examined by performing diagnostic PCR experiments with specific primers; the β-Proteobacteria symbiont was detected in all five pseudococcids, the γ-Proteobacteria symbiont was found in three, and the spiroplasma symbiont was detected only in A. crawii. PMID:10653730

SU-C-17A-02: Sirius MRI Markers for Prostate Post-Implant Assessment: MR Protocol Development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lim, T; Wang, J; Kudchadker, R

Purpose: Currently, CT is used to visualize prostate brachytherapy sources, at the expense of accurate structure contouring. MRI is superior to CT for anatomical delineation, but the sources appear as voids on MRI images. Previously we have developed Sirius MRI markers (C4 Imaging) to replace spacers to assist source localization on MRI images. Here we develop an MRI pulse sequence protocol that enhances the signal of these markers to enable MRI-only post-implant prostate dosimetric analysis. Methods: To simulate a clinical scenario, a CIRS multi-modality prostate phantom was implanted with 66 markers and 86 sources. The implanted phantom was imaged onmore » both 1.5T and 3.0T GE scanners under various conditions, different pulse sequences (2D fast spin echo [FSE], 3D balanced steadystate free precession [bSSFP] and 3D fast spoiled gradient echo [FSPGR]), as well as varying amount of padding to simulate various patient sizes and associated signal fall-off from the surface coil elements. Standard FSE sequences from the current clinical protocols were also evaluated. Marker visibility, marker size, intra-marker distance, total scan time and artifacts were evaluated for various combinations of echo time, repetition time, flip angle, number of excitations, bandwidth, slice thickness and spacing, fieldof- view, frequency/phase encoding steps and frequency direction. Results: We have developed a 3D FSPGR pulse sequence that enhances marker signal and ensures the integrity of the marker shape while maintaining reasonable scan time. For patients contraindicated for 3.0T, we have also developed a similar sequence for 1.5T scanners. Signal fall-off with distance from prostate to coil can be compensated mainly by decreasing bandwidth. The markers are not visible using standard FSE sequences. FSPGR sequences are more robust for consistent marker visualization as compared to bSSFP sequences. Conclusion: The developed MRI pulse sequence protocol for Sirius MRI markers assists source localization to enable MRIonly post-implant prostate dosimetric analysis. S.J. Frank is a co-founder of C4 Imaging (manufactures the MRI markers)« less
Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

PubMed

Reiz, Bela; Li, Liang

2010-09-01

Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.
Diversity of chloroplast genome among local clones of cocoa (Theobroma cacao, L.) from Central Sulawesi

NASA Astrophysics Data System (ADS)

Suwastika, I. Nengah; Pakawaru, Nurul Aisyah; Rifka, Rahmansyah, Muslimin, Ishizaki, Yoko; Cruz, André Freire; Basri, Zainuddin; Shiina, Takashi

2017-02-01

Chloroplast genomes typically range in size from 120 to 170 kilo base pairs (kb), which relatively conserved among plant species. Recent evaluation on several species, certain unique regions showed high variability which can be utilized in the phylogenetic analysis. Many fragments of coding regions, introns, and intergenic spacers, such as atpB-rbcL, ndhF, rbcL, rpl16, trnH-psbA, trnL-F, trnS-G, etc., have been used for phylogenetic reconstructions at various taxonomic levels. Based on that status, we would like to analysis the diversity of chloroplast genome within species of local cacao (Theobroma cacao L.) from Central Sulawesi. Our recent data showed, there were more than 20 clones from local farming in Central Sulawesi, and it can be detected based on phenotypic and nuclear-genome-based characterization (RAPD- Random Amplified Polymorphic DNA and SSR- Simple Sequences Repeat) markers. In developing DNA marker for this local cacao, here we also included analysis based on the variation of chloroplast genome. At least several regions such as rpl32-TurnL, it can be considered as chloroplast markers on our local clone of cocoa. Furthermore, we could develop phylogenetic analysis in between clones of cocoa.
A comparative analysis of MC4R gene sequence, polymorphism, and chromosomal localization in Chinese raccoon dog and Arctic fox.

PubMed

Skorczyk, Anna; Flisikowski, Krzysztof; Switonski, Marek

2012-05-01

Numerous mutations of the human melanocortin receptor type 4 (MC4R) gene are responsible for monogenic obesity, and some of them appear to be associated with predisposition or resistance to polygenic obesity. Thus, this gene is considered a functional candidate for fat tissue accumulation and body weight in domestic mammals. The aim of the study was comparative analysis of chromosome localization, nucleotide sequence, and polymorphism of the MC4R gene in two farmed species of the Canidae family, namely the Chinese raccoon dog (Nycterutes procyonoides procyonoides) and the arctic fox (Alopex lagopus). The whole coding sequence, including fragments of 3'UTR and 5'UTR, shows 89% similarity between the arctic fox (1276 bp) and Chinese raccoon dog (1213 bp). Altogether, 30 farmed Chinese raccoon dogs and 30 farmed arctic foxes were searched for polymorphisms. In the Chinese raccoon dog, only one silent substitution in the coding sequence was identified; whereas in the arctic fox, four InDels and two single-nucleotide polymorphisms (SNPs) in the 5'UTR and six silent SNPs in the exon were found. The studied gene was mapped by FISH to the Chinese raccoon dog chromosome 9 (NPP9q1.2) and arctic fox chromosome 24 (ALA24q1.2-1.3). The obtained results are discussed in terms of genome evolution of species belonging to the family Canidae and their potential use in animal breeding.
Regional spread of HIV-1 M subtype B in middle-aged patients by random env-C2V4 region sequencing

PubMed Central

Stürmer, Martin; Zimmermann, Katrin; Fritzsche, Carlos; Reisinger, Emil; Doelken, Gottfried; Berger, Annemarie; Doerr, Hans W.; Eberle, Josef

2010-01-01

A transmission cluster of HIV-1 M:B was identified in 11 patients with a median age of 52 (range 26–65) in North-East Germany by C2V4 region sequencing of the env gene of HIV-1, who—except of one—were not aware of any risky behaviour. The 10 male and 1 female patients deteriorated immunologically, according to their information made available, within 4 years after a putative HIV acquisition. Nucleic acid sequence analysis showed a R5 virus in all patients and in 7 of 11 a crown motif of the V3 loop, GPGSALFTT, which is found rarely. Analysis of formation of this cluster showed that there is still a huge discrepancy between awareness and behaviour regarding HIV transmission in middle-aged patients, and that a local outbreak can be detected by nucleic acid analysis of the hypervariable env region. PMID:20217125
A basic analysis toolkit for biological sequences

PubMed Central

Giancarlo, Raffaele; Siragusa, Alessandro; Siragusa, Enrico; Utro, Filippo

2007-01-01

This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL. PMID:17877802
Genome sequence analysis of dengue virus 1 isolated in Key West, Florida.

PubMed

Shin, Dongyoung; Richards, Stephanie L; Alto, Barry W; Bettinardi, David J; Smartt, Chelsea T

2013-01-01

Dengue virus (DENV) is transmitted to humans through the bite of mosquitoes. In November 2010, a dengue outbreak was reported in Monroe County in southern Florida (FL), including greater than 20 confirmed human cases. The virus collected from the human cases was verified as DENV serotype 1 (DENV-1) and one isolate was provided for sequence analysis. RNA was extracted from the DENV-1 isolate and was used in reverse transcription polymerase chain reaction (RT-PCR) to amplify PCR fragments to sequence. Nucleic acid primers were designed to generate overlapping PCR fragments that covered the entire genome. The DENV-1 isolate found in Key West (KW), FL was sequenced for whole genome characterization. Sequence assembly, Genbank searches, and recombination analyses were performed to verify the identity of the genome sequences and to determine percent similarity to known DENV-1 sequences. We show that the KW DENV-1 strain is 99% identical to Nicaraguan and Mexican DENV-1 strains. Phylogenetic and recombination analyses suggest that the DENV-1 isolated in KW originated from Nicaragua (NI) and the KW strain may circulate in KW. Also, recombination analysis results detected recombination events in the KW strain compared to DENV-1 strains from Puerto Rico. We evaluate the relative growth of KW strain of DENV-1 compared to other dengue viruses to determine whether the underlying genetics of the strain is associated with a replicative advantage, an important consideration since local transmission of DENV may result because domestic tourism can spread DENVs.
Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

PubMed Central

Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

1985-01-01

The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
SfiI genomic cleavage map of Escherichia coli K-12 strain MG1655.

PubMed Central

Perkins, J D; Heath, J D; Sharma, B R; Weinstock, G M

1992-01-01

An SfiI restriction map of Escherichia coli K-12 strain MG1655 is presented. The map contains thirty-one cleavage sites separating fragments ranging in size from 407 kb to 3.7 kb. Several techniques were used in the construction of this map, including CHEF pulsed field gel electrophoresis; physical analysis of a set of twenty-six auxotrophic transposon insertions; correlation with the restriction map of Kohara and coworkers using the commercially available E. coli Gene Mapping Membranes; analysis of publicly available sequence information; and correlation of the above data with the combined genetic and physical map developed by Rudd, et al. The combination of these techniques has yielded a map in which all but one site can be localized within a range of +/- 2 kb, and over half the sites can be localized precisely by sequence data. Two sites present in the EcoSeq5 sequence database are not cleaved in MG1655 and four sites are noted to be sensitive to methylation by the dcm methylase. This map, combined with the NotI physical map of MG1655, can aid in the rapid, precise mapping of several different types of genetic alterations, including transposon mediated mutations and other insertions, inversions, deletions and duplications. Images PMID:1312707
Application of time-resolved shadowgraph imaging and computer analysis to study micrometer-scale response of superfluid helium

NASA Astrophysics Data System (ADS)

Sajjadi, Seyed; Buelna, Xavier; Eloranta, Jussi

2018-01-01

Application of inexpensive light emitting diodes as backlight sources for time-resolved shadowgraph imaging is demonstrated. The two light sources tested are able to produce light pulse sequences in the nanosecond and microsecond time regimes. After determining their time response characteristics, the diodes were applied to study the gas bubble formation around laser-heated copper nanoparticles in superfluid helium at 1.7 K and to determine the local cavitation bubble dynamics around fast moving metal micro-particles in the liquid. A convolutional neural network algorithm for analyzing the shadowgraph images by a computer is presented and the method is validated against the results from manual image analysis. The second application employed the red-green-blue light emitting diode source that produces light pulse sequences of the individual colors such that three separate shadowgraph frames can be recorded onto the color pixels of a charge-coupled device camera. Such an image sequence can be used to determine the moving object geometry, local velocity, and acceleration/deceleration. These data can be used to calculate, for example, the instantaneous Reynolds number for the liquid flow around the particle. Although specifically demonstrated for superfluid helium, the technique can be used to study the dynamic response of any medium that exhibits spatial variations in the index of refraction.
Clinical and Molecular Epidemiology of Human Parainfluenza Viruses 1-4 in Children from Viet Nam.

PubMed

Linster, Martin; Do, Lien Anh Ha; Minh, Ngo Ngoc Quang; Chen, Yihui; Zhe, Zhu; Tuan, Tran Anh; Tuan, Ha Manh; Su, Yvonne C F; van Doorn, H Rogier; Moorthy, Mahesh; Smith, Gavin J D

2018-05-01

HPIVs are serologically and genetically grouped into four species that account for up to 10% of all hospitalizations due to acute respiratory infection in children under the age of five. Genetic and epidemiological data for the four HPIVs derived from two pediatric cohorts in Viet Nam are presented. Respiratory samples were screened for HPIV1-4 by real-time PCR. Demographic and clinical data of patients infected with different HPIV were compared. We used a hemi-nested PCR approach to generate viral genome sequences from HPIV-positive samples and conducted a comprehensive phylogenetic analysis. In total, 170 samples tested positive for HPIV. HPIV3 was most commonly detected in our cohort and 80 co-detections of HPIV with other respiratory viruses were found. Phylogenetic analyses suggest local endemic circulation as well as punctuated introductions of new HPIV lineages. Viral gene flow analysis revealed that Viet Nam is a net importer of viral genetic diversity. Epidemiological analyses imply similar disease severity for all HPIV species. HPIV sequences from Viet Nam formed local clusters and were interspersed with sequences from diverse geographic regions. Combined, this new knowledge will help to investigate global HPIV circulation patterns in more detail and ultimately define more suitable vaccine strains.
PRIMAL: Page Rank-Based Indoor Mapping and Localization Using Gene-Sequenced Unlabeled WLAN Received Signal Strength

PubMed Central

Zhou, Mu; Zhang, Qiao; Xu, Kunjie; Tian, Zengshan; Wang, Yanmeng; He, Wei

2015-01-01

Due to the wide deployment of wireless local area networks (WLAN), received signal strength (RSS)-based indoor WLAN localization has attracted considerable attention in both academia and industry. In this paper, we propose a novel page rank-based indoor mapping and localization (PRIMAL) by using the gene-sequenced unlabeled WLAN RSS for simultaneous localization and mapping (SLAM). Specifically, first of all, based on the observation of the motion patterns of the people in the target environment, we use the Allen logic to construct the mobility graph to characterize the connectivity among different areas of interest. Second, the concept of gene sequencing is utilized to assemble the sporadically-collected RSS sequences into a signal graph based on the transition relations among different RSS sequences. Third, we apply the graph drawing approach to exhibit both the mobility graph and signal graph in a more readable manner. Finally, the page rank (PR) algorithm is proposed to construct the mapping from the signal graph into the mobility graph. The experimental results show that the proposed approach achieves satisfactory localization accuracy and meanwhile avoids the intensive time and labor cost involved in the conventional location fingerprinting-based indoor WLAN localization. PMID:26404274
Organizing, exploring, and analyzing antibody sequence data: the case for relational-database managers.

PubMed

Owens, John

2009-01-01

Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
Stratigraphic palaeobiology around the Pliocene-Pleistocene boundary at Altavilla Milicia (Sicily, Italy)

NASA Astrophysics Data System (ADS)

Dominici, Stefano; Benvenuti, Marco; Garilli, Vittorio; Uchman, Alfred; Pollina, Francesco

2017-04-01

The Pliocene-Pleistocene around Altavilla Milicia, near Palermo (Sicily), includes a thick siliciclastic succession rich with shell beds, dominated by molluscs, brachiopods and annelids in fine-grained, totally bioturbated sandstones. Taphonomy of fossil assemblages indicates the importance of taphonomic feedback and within-habitat time-averaging in proximity of maximum flooding intervals. The trace fossil suite is characterized by the abundance of Thalassinoides paradoxicus boxworks and by local occurence of Scalichnus, Piscichnus, ?Scolicia, ?Bichordites, Ophiomorpha, ?Gyrolithes, Palaeophycus, Diopatrichnus and ?Taenidium. These trace fossils are typical of the archetypal Cruziana ichnofacies, with local elements of the proximal Cruziana ichnofacies, which point to deposition mainly below the fairweather wave base. Three depositional sequences, characterized by geometries driven by the interplay of eustatism and regional tectonics, were recognized through sedimentary facies analysis. Biostratigraphic data frame the oldest sequence in the upper Pliocene, whereas the thickest part of the succession, occupied by the second sedimentary sequence, includes biozone NN16b/17 of calcareous nannoplankton stratigraphy, thereby comprising the base of the Pleistocene. Transgressive deposits of the third and uppermost sequence are marked by encrusted and bioeroded pebbles with sparse oyster shells. The whole time interval is characterized by glacio-eustatic fluctuations in the 50-100 m range and with 100 ky-periodicity. We performed a multivariate analysis of 22 samples yielding 92 species of mollusks collected in the first and second sequences. Clustering and ordination analysis allowed to recognize a gradient controlled by depth-related environmental variables. At one end of the continuum we have a very-shallow water assemblage dominated by the bivalve Loripes orbiculatus, indicating an organic-rich seagrass bottom. Opposite in the continuum is an offshore assemblage dominated by Corbula gibba and the extinct gastropod Petaloconchus intortus. Both the shallowest and the deepest assemblages are from the first (Piacenzian) sequence. The gradient at intermediate depths is better characterized by restricting the analysis to 17 collections from the second sequence (Piacenzian-Gelasian). The shallowest assemblage is here dominated by upper shoreface species, such as Tellina spp. and Spisula subtruncata, and the deepest by muddy bottom, offshore transition species, such as Venus nux, the extinct gastropod Nassarius semistriatus and deposit-feeding nuculanoid bivalves. Plotting samples along the composite section allows to recognize two deepening-upward trends and two intervals of maximum flooding, in accordance with the sequence-stratigraphic interpretation. Stratigraphic palaeobiology proves to be a powerful tool to understand factors that control the geologic record during an interval of intense climate change.
Characterization of Vibrio parahaemolyticus clinical strains from Maryland (2012-2013) and comparisons to a locally and globally diverse V. parahaemolyticus strains by whole-genome sequence analysis.

PubMed

Haendiges, Julie; Timme, Ruth; Allard, Marc W; Myers, Robert A; Brown, Eric W; Gonzalez-Escalona, Narjol

2015-01-01

Vibrio parahaemolyticus is the leading cause of foodborne illnesses in the US associated with the consumption of raw shellfish. Previous population studies of V. parahaemolyticus have used Multi-Locus Sequence Typing (MLST) or Pulsed Field Gel Electrophoresis (PFGE). Whole genome sequencing (WGS) provides a much higher level of resolution, but has been used to characterize only a few United States (US) clinical isolates. Here we report the WGS characterization of 34 genomes of V. parahaemolyticus strains that were isolated from clinical cases in the state of Maryland (MD) during 2 years (2012-2013). These 2 years saw an increase of V. parahaemolyticus cases compared to previous years. Among these MD isolates, 28% were negative for tdh and trh, 8% were tdh positive only, 11% were trh positive only, and 53% contained both genes. We compared this set of V. parahaemolyticus genomes to those of a collection of 17 archival strains from the US (10 previously sequenced strains and 7 from NCBI, collected between 1988 and 2004) and 15 international strains, isolated from geographically-diverse environmental and clinical sources (collected between 1980 and 2010). A WGS phylogenetic analysis of these strains revealed the regional outbreak strains from MD are highly diverse and yet genetically distinct from the international strains. Some MD strains caused outbreaks 2 years in a row, indicating a local source of contamination (e.g., ST631). Advances in WGS will enable this type of analysis to become routine, providing an excellent tool for improved surveillance. Databases built with phylogenetic data will help pinpoint sources of contamination in future outbreaks and contribute to faster outbreak control.
Characterization of Vibrio parahaemolyticus clinical strains from Maryland (2012–2013) and comparisons to a locally and globally diverse V. parahaemolyticus strains by whole-genome sequence analysis

PubMed Central

Haendiges, Julie; Timme, Ruth; Allard, Marc W.; Myers, Robert A.; Brown, Eric W.; Gonzalez-Escalona, Narjol

2015-01-01

Vibrio parahaemolyticus is the leading cause of foodborne illnesses in the US associated with the consumption of raw shellfish. Previous population studies of V. parahaemolyticus have used Multi-Locus Sequence Typing (MLST) or Pulsed Field Gel Electrophoresis (PFGE). Whole genome sequencing (WGS) provides a much higher level of resolution, but has been used to characterize only a few United States (US) clinical isolates. Here we report the WGS characterization of 34 genomes of V. parahaemolyticus strains that were isolated from clinical cases in the state of Maryland (MD) during 2 years (2012–2013). These 2 years saw an increase of V. parahaemolyticus cases compared to previous years. Among these MD isolates, 28% were negative for tdh and trh, 8% were tdh positive only, 11% were trh positive only, and 53% contained both genes. We compared this set of V. parahaemolyticus genomes to those of a collection of 17 archival strains from the US (10 previously sequenced strains and 7 from NCBI, collected between 1988 and 2004) and 15 international strains, isolated from geographically-diverse environmental and clinical sources (collected between 1980 and 2010). A WGS phylogenetic analysis of these strains revealed the regional outbreak strains from MD are highly diverse and yet genetically distinct from the international strains. Some MD strains caused outbreaks 2 years in a row, indicating a local source of contamination (e.g., ST631). Advances in WGS will enable this type of analysis to become routine, providing an excellent tool for improved surveillance. Databases built with phylogenetic data will help pinpoint sources of contamination in future outbreaks and contribute to faster outbreak control. PMID:25745421
Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

PubMed

Newell, Nicholas E

2011-12-15

The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.
Dynamical decoupling of local transverse random telegraph noise in a two-qubit gate

NASA Astrophysics Data System (ADS)

D'Arrigo, A.; Falci, G.; Paladino, E.

2015-10-01

Achieving high-fidelity universal two-qubit gates is a central requisite of any implementation of quantum information processing. The presence of spurious fluctuators of various physical origin represents a limiting factor for superconducting nanodevices. Operating qubits at optimal points, where the qubit-fluctuator interaction is transverse with respect to the single qubit Hamiltonian, considerably improved single qubit gates. Further enhancement has been achieved by dynamical decoupling (DD). In this article we investigate DD of transverse random telegraph noise acting locally on each of the qubits forming an entangling gate. Our analysis is based on the exact numerical solution of the stochastic Schrödinger equation. We evaluate the gate error under local periodic, Carr-Purcell and Uhrig DD sequences. We find that a threshold value of the number, n, of pulses exists above which the gate error decreases with a sequence-specific power-law dependence on n. Below threshold, DD may even increase the error with respect to the unconditioned evolution, a behaviour reminiscent of the anti-Zeno effect.
Six components observations of local earthquakes during the 2016 Central Italy seismic sequence

NASA Astrophysics Data System (ADS)

Simonelli, A.; Bernauer, F.; Chow, B.; Braun, T.; Wassermann, J. M.; Igel, H.

2017-12-01

For many years the seismological community has looked for a reliable, sensitive, broadband three-component portable rotational sensor. In this preliminary study, we show the possibility of measuring and extracting relevant seismological information from local earthquakes. We employ portable three-component rotational sensors, insensitive to translations, which operate on optical interferometry principles (Sagnac effect). Multiple sensors recording redundantly add significance to the measurements.During the Central Italy seismic sequence in November 2016, we deployed two portable fiber-optic gyroscopes (BlueSeis3A from iXBlue and LCG demonstrator from LITEF) and a broadband seismometer in Colfiorito, Italy. We present here the six-component observations, with analysis of rotational (three redundant components) and translational (three components) ground motions, generated by earthquakes at local distances. For each seismic event, we compare coherence between rotational sensors and estimate a back azimuth consistent with theoretical values. We also estimate Love and Rayleigh wave phase velocities in the 5 to 10 Hz frequency range.
HIV-1 diversity, transmission dynamics and primary drug resistance in Angola.

PubMed

Bártolo, Inês; Zakovic, Suzana; Martin, Francisco; Palladino, Claudia; Carvalho, Patrícia; Camacho, Ricardo; Thamm, Sven; Clemente, Sofia; Taveira, Nuno

2014-01-01

To assess HIV-1 diversity, transmission dynamics and prevalence of transmitted drug resistance (TDR) in Angola, five years after ART scale-up. Population sequencing of the pol gene was performed on 139 plasma samples collected in 2009 from drug-naive HIV-1 infected individuals living in Luanda. HIV-1 subtypes were determined using phylogenetic analysis. Drug resistance mutations were identified using the Calibrated Population Resistance Tool (CPR). Transmission networks were determined using phylogenetic analysis of all Angolan sequences present in the databases. Evolutionary trends were determined by comparison with a similar survey performed in 2001. 47.1% of the viruses were pure subtypes (all except B), 47.1% were recombinants and 5.8% were untypable. The prevalence of subtype A decreased significantly from 2001 to 2009 (40.0% to 10.8%, P = 0.0019) while the prevalence of unique recombinant forms (URFs) increased > 2-fold (40.0% to 83.1%, P < 0.0001). The most frequent URFs comprised untypable sequences with subtypes H (U/H, n = 7, 10.8%), A (U/A, n = 6, 9.2%) and G (G/U, n = 4, 6.2%). Newly identified U/H recombinants formed a highly supported monophyletic cluster suggesting a local and common origin. TDR mutation K103N was found in one (0.7%) patient (1.6% in 2001). Out of the 364 sequences sampled for transmission network analysis, 130 (35.7%) were part of a transmission network. Forty eight transmission clusters were identified; the majority (56.3%) comprised sequences sampled in 2008-2010 in Luanda which is consistent with a locally fuelled epidemic. Very low genetic distance was found in 27 transmission pairs sampled in the same year, suggesting recent transmission events. Transmission of drug resistant strains was still negligible in Luanda in 2009, five years after the scale-up of ART. The dominance of small and recent transmission clusters and the emergence of new URFs are consistent with a rising HIV-1 epidemics mainly driven by heterosexual transmission.

Molecular detection and species identification of Alexandrium (Dinophyceae) causing harmful algal blooms along the Chilean coastline

PubMed Central

Jedlicki, Ana; Fernández, Gonzalo; Astorga, Marcela; Oyarzún, Pablo; Toro, Jorge E.; Navarro, Jorge M.; Martínez, Víctor

2012-01-01

Background and aims On the basis of morphological evidence, the species involved in South American Pacific coast harmful algal blooms (HABs) has been traditionally recognized as Alexandrium catenella (Dinophyceae). However, these observations have not been confirmed using evidence based on genomic sequence variability. Our principal objective was to accurately determine the species of Alexandrium involved in local HABs in order to implement a real-time polymerase chain reaction (PCR) assay for its rapid and easy detection on filter-feeding shellfish, such as mussels. Methodology For species-specific determination, the intergenic spacer 1 (ITS1), 5.8S subunit, ITS2 and the hypervariable genomic regions D1–D5 of the large ribosomal subunit of local strains were sequenced and compared with two data sets of other Alexandrium sequences. Species-specific primers were used to amplify signature sequences within the genomic DNA of the studied species by conventional and real-time PCR. Principal results Phylogenetic analysis determined that the Chilean strain falls into Group I of the tamarensis complex. Our results support the allocation of the Chilean Alexandrium species as a toxic Alexandrium tamarense rather than A. catenella, as currently defined. Once local species were determined to belong to Group I of the tamarensis complex, a highly sensitive and accurate real-time PCR procedure was developed to detect dinoflagellate presence in Mytilus spp. (Bivalvia) samples after being fed (challenged) in vitro with the Chilean Alexandrium strain. The results show that real-time PCR is useful to detect Alexandrium intake in filter-feeding molluscs. Conclusions It has been shown that the classification of local Alexandrium using morphological evidence is not very accurate. Molecular methods enabled the HAB dinoflagellate species of the Chilean coast to be assigned as A. tamarense rather than A. catenella. Real-time PCR analysis based on A. tamarense primers allowed the detection of dinoflagellate DNA in Mytilus spp. samples exposed to this alga. Through the specific assignment of dinoflagellate species involved in HABs, more reliable preventive policies can be implemented. PMID:23259043
Primary structure and subcellular localization of two fimbrial subunit-like proteins involved in the biosynthesis of K99 fibrillae.

PubMed

Roosendaal, E; Jacobs, A A; Rathman, P; Sondermeyer, C; Stegehuis, F; Oudega, B; de Graaf, F K

1987-09-01

Analysis of the nucleotide sequence of the distal part of the fan gene cluster encoding the proteins involved in the biosynthesis of the fibrillar adhesin, K99, revealed the presence of two structural genes, fanG and fanH. The amino acid sequence of the gene products (FanG and FanH) showed significant homology to the amino acid sequence of the fibrillar subunit protein (FanC). Introduction of a site-specific frameshift mutation in fanG or fanH resulted in a simultaneous decrease in fibrillae production and adhesive capacity. Analysis of subcellular fractions showed that, in contrast to the K99 fibrillar subunit (FanC), both the FanH and the FanG protein were loosely associated with the outer membrane, possibly on the periplasmic side, but were not components of the fimbriae themselves.
Supervised classification of brain tissues through local multi-scale texture analysis by coupling DIR and FLAIR MR sequences

NASA Astrophysics Data System (ADS)

Poletti, Enea; Veronese, Elisa; Calabrese, Massimiliano; Bertoldo, Alessandra; Grisan, Enrico

2012-02-01

The automatic segmentation of brain tissues in magnetic resonance (MR) is usually performed on T1-weighted images, due to their high spatial resolution. T1w sequence, however, has some major downsides when brain lesions are present: the altered appearance of diseased tissues causes errors in tissues classification. In order to overcome these drawbacks, we employed two different MR sequences: fluid attenuated inversion recovery (FLAIR) and double inversion recovery (DIR). The former highlights both gray matter (GM) and white matter (WM), the latter highlights GM alone. We propose here a supervised classification scheme that does not require any anatomical a priori information to identify the 3 classes, "GM", "WM", and "background". Features are extracted by means of a local multi-scale texture analysis, computed for each pixel of the DIR and FLAIR sequences. The 9 textures considered are average, standard deviation, kurtosis, entropy, contrast, correlation, energy, homogeneity, and skewness, evaluated on a neighborhood of 3x3, 5x5, and 7x7 pixels. Hence, the total number of features associated to a pixel is 56 (9 textures x3 scales x2 sequences +2 original pixel values). The classifier employed is a Support Vector Machine with Radial Basis Function as kernel. From each of the 4 brain volumes evaluated, a DIR and a FLAIR slice have been selected and manually segmented by 2 expert neurologists, providing 1st and 2nd human reference observations which agree with an average accuracy of 99.03%. SVM performances have been assessed with a 4-fold cross-validation, yielding an average classification accuracy of 98.79%.
PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION.

PubMed

Elati, Mohamed; Nicolle, Rémy; Junier, Ivan; Fernández, David; Fekih, Rim; Font, Julio; Képès, François

2013-02-01

Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli, respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

PubMed

Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

2006-03-31

Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Effect of sequence-dependent rigidity on plectoneme localization in dsDNA

NASA Astrophysics Data System (ADS)

Medalion, Shlomi; Rabin, Yitzhak

2016-04-01

We use Monte-Carlo simulations to study the effect of variable rigidity on plectoneme formation and localization in supercoiled double-stranded DNA. We show that the presence of soft sequences increases the number of plectoneme branches and that the edges of the branches tend to be localized at these sequences. We propose an experimental approach to test our results in vitro, and discuss the possible role played by plectoneme localization in the search process of transcription factors for their targets (promoter regions) on the bacterial genome.
Molecular characterization of Blastocystis isolates from children and rhesus monkeys in Kathmandu, Nepal.

PubMed

Yoshikawa, Hisao; Wu, Zhiliang; Pandey, Kishor; Pandey, Basu Dev; Sherchand, Jeevan Bahadur; Yanagi, Tetsuo; Kanbara, Hiroji

2009-03-23

To investigate the possible transmission of Blastocystis organisms between local rhesus monkeys and children in Kathmandu, Nepal, we compared the subtype (ST) and sequence of Blastocystis isolates from children with gastrointestinal symptoms and local rhesus monkeys. Twenty and 10 Blastocystis isolates were established from 82 and 10 fecal samples obtained from children and monkeys, respectively. Subtype analysis with seven sequence-tagged site (STS) primers indicated that the prevalence of Blastocystis sp. ST1, ST2 and ST3 was 20%, 20% and 60% in the child isolates, respectively. In contrast to human isolates, ST3 was not found in monkey isolates and the prevalence of ST1 and ST2 was 50% and 70%, respectively, including three mixed STs1 and 2 and one isolate not amplified by any STS primers, respectively. Since Blastocystis sp. ST2 has been reported as the most dominant genotype in the survey of Blastocystis infection among the various monkey species, sequence comparison of the 150bp variable region of the small subunit rRNA (SSU rRNA) gene was conducted among ST2 isolates of humans and monkeys. Sequence alignment of 24 clones developed from ST2 isolates of 4 humans and 4 monkeys showed three distinct subgroups, defined as ST2A, ST2B and ST2C. These three subgroups were shared between the child and monkey isolates. These results suggest that the local rhesus monkeys are a possible source of Blastocystis sp. ST2 infection of humans in Kathmandu.
New species and phylogenetic relationships of the spider genus Coptoprepes using morphological and sequence data (Araneae: Anyphaenidae).

PubMed

Barone, Mariana L; Werenkraut, Victoria; Ramírez, Martín J

2016-10-17

We present evidence from the standard cytochrome c oxidase subunit I (COI) barcoding marker and from new collections, showing that the males and females of C. ecotono Werenkraut & Ramírez were mismatched, and describe the female of that species for the first time. An undescribed male from Chile is assigned to the new species Coptoprepes laudani, together with the female that was previously thought as C. ecotono. The matching of sexes is justified after a dual cladistics analysis of morphological and sequence data in combination. New locality data and barcoding sequences are provided for other species of Coptoprepes, all endemic of the temperate forests of Chile and adjacent Argentina. Although morphology and sequences are not conclusive on the relationships of Coptoprepes species, the sequence data suggests that the species without a retrolateral tibial apophysis may belong to an independent lineage.
Genome-Wide Identification of Regulatory Sequences Undergoing Accelerated Evolution in the Human Genome

PubMed Central

Dong, Xinran; Wang, Xiao; Zhang, Feng; Tian, Weidong

2016-01-01

Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes. PMID:27401230
Mapping of aldose reductase gene sequences to human chromosomes 1, 3, 7, 9, 11, and 13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bateman, J.B.; Kojis, T.; Heinzmann, C.

1993-09-01

Aldose reductase (alditol:NAD(P)+ 1-oxidoreductase; EC 1.1.1.21) (AR) catalyzes the reduction of several aldehydes, including that of glucose, to the corresponding sugar alcohol. Using a complementary DNA clone encoding human AR, the authors mapped the gene sequences to human chromosomes 1, 3, 7, 9, 11, 13, 14, and 18 by somatic cell hybridization. By in situ hybridization analysis, sequences were localized to human chromosomes 1q32-q43, 3p12, 7q31-q35, 9q22, 11p14-p15, and 13q14-q21. As a putative functional AR gene has been mapped to chromosome 7 and a putative pseudogene to chromosome 3, the sequences on the other seven chromosomes may represent other activemore » genes, non-aldose reductase homologous sequences, or pseudogenes. 24 refs., 3 figs., 2 tabs.« less
Face recognition based on matching of local features on 3D dynamic range sequences

NASA Astrophysics Data System (ADS)

Echeagaray-Patrón, B. A.; Kober, Vitaly

2016-09-01

3D face recognition has attracted attention in the last decade due to improvement of technology of 3D image acquisition and its wide range of applications such as access control, surveillance, human-computer interaction and biometric identification systems. Most research on 3D face recognition has focused on analysis of 3D still data. In this work, a new method for face recognition using dynamic 3D range sequences is proposed. Experimental results are presented and discussed using 3D sequences in the presence of pose variation. The performance of the proposed method is compared with that of conventional face recognition algorithms based on descriptors.
GenomeVIP: a cloud platform for genomic variant discovery and interpretation

PubMed Central

Mashl, R. Jay; Scott, Adam D.; Huang, Kuan-lin; Wyczalkowski, Matthew A.; Yoon, Christopher J.; Niu, Beifang; DeNardo, Erin; Yellapantula, Venkata D.; Handsaker, Robert E.; Chen, Ken; Koboldt, Daniel C.; Ye, Kai; Fenyö, David; Raphael, Benjamin J.; Wendl, Michael C.; Ding, Li

2017-01-01

Identifying genomic variants is a fundamental first step toward the understanding of the role of inherited and acquired variation in disease. The accelerating growth in the corpus of sequencing data that underpins such analysis is making the data-download bottleneck more evident, placing substantial burdens on the research community to keep pace. As a result, the search for alternative approaches to the traditional “download and analyze” paradigm on local computing resources has led to a rapidly growing demand for cloud-computing solutions for genomics analysis. Here, we introduce the Genome Variant Investigation Platform (GenomeVIP), an open-source framework for performing genomics variant discovery and annotation using cloud- or local high-performance computing infrastructure. GenomeVIP orchestrates the analysis of whole-genome and exome sequence data using a set of robust and popular task-specific tools, including VarScan, GATK, Pindel, BreakDancer, Strelka, and Genome STRiP, through a web interface. GenomeVIP has been used for genomic analysis in large-data projects such as the TCGA PanCanAtlas and in other projects, such as the ICGC Pilots, CPTAC, ICGC-TCGA DREAM Challenges, and the 1000 Genomes SV Project. Here, we demonstrate GenomeVIP's ability to provide high-confidence annotated somatic, germline, and de novo variants of potential biological significance using publicly available data sets. PMID:28522612
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

PubMed

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

PubMed Central

Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

2017-01-01

Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

PubMed

Yin, Li; Yao, Jiqiang; Gardner, Brent P; Chang, Kaifen; Yu, Fahong; Goodenow, Maureen M

2012-01-01

Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.
Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

NASA Astrophysics Data System (ADS)

Hamid, Nur Athirah Abd; Ismail, Ismanizan

2013-11-01

Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Description and physical localization of the bovine survival of motor neuron gene (SMN).

PubMed

Pietrowski, D; Goldammer, T; Meinert, S; Schwerin, M; Förster, M

1998-01-01

Proximal spinal muscular atrophy (SMA) is an autosomal recessive disease in humans and other mammals, characterized by degeneration of anterior horn cells of the spinal cord. In humans, the survival of motor neuron gene (SMN) has been recognized as the SMA-determining gene and has been mapped to 5q13. In cattle, SMA is a recurrent, inherited disease that plays an important economic role in breeding programs of Brown Swiss stock. Now we have identified the full- length cDNA sequence of the bovine SMN gene. Molecular analysis and characterization of the sequence documents 85% identity to its human counterpart and three evolutionarily conserved domains in different species. Physical mapping data reveals that bovine SMN is localized to chromosome region 20q12-->q13, supporting the conserved synteny of this chromosomal region between humans and cattle.
Enterohemorrhagic Escherichia coli O157 in milk and dairy products from Libya: Isolation and molecular identification by partial sequencing of 16S rDNA

PubMed Central

Garbaj, Aboubaker M.; Awad, Enas M.; Azwai, Salah M.; Abolghait, Said K.; Naas, Hesham T.; Moawad, Ashraf A.; Gammoudi, Fatim T.; Barbieri, Ilaria; Eldaghayes, Ibrahim M.

2016-01-01

Aim: The aim of this work was to isolate and molecularly identify enterohemorrhagic Escherichia coli (EHEC) O157 in milk and dairy products in Libya, in addition; to clear the accuracy of cultural and biochemical identification as compared with molecular identification by partial sequencing of 16S rDNA for the existing isolates. Materials and Methods: A total of 108 samples of raw milk (cow, she-camel, and goat) and locally made dairy products (fermented cow’s milk, Maasora, Ricotta and ice cream) were collected from some regions (Janzour, Tripoli, Kremiya, Tajoura and Tobruk) in Libya. Samples were subjected to microbiological analysis for isolation of E. coli that was detected by conventional cultural and molecular method using polymerase chain reaction and partial sequencing of 16S rDNA. Results: Out of 108 samples, only 27 isolates were found to be EHEC O157 based on their cultural characteristics (Tellurite-Cefixime-Sorbitol MacConkey) that include 3 isolates from cow’s milk (11%), 3 isolates from she-camel’s milk (11%), two isolates from goat’s milk (7.4%) and 7 isolates from fermented raw milk samples (26%), isolates from fresh locally made soft cheeses (Maasora and Ricotta) were 9 (33%) and 3 (11%), respectively, while none of the ice cream samples revealed any growth. However, out of these 27 isolates, only 11 were confirmed to be E. coli by partial sequencing of 16S rDNA and E. coli O157 Latex agglutination test. Phylogenetic analysis revealed that majority of local E. coli isolates were related to E. coli O157:H7 FRIK944 strain. Conclusion: These results can be used for further studies on EHEC O157 as an emerging foodborne pathogen and its role in human infection in Libya. PMID:27956766
The Caenorhabditis elegans gene unc-89, required fpr muscle M-line assembly, encodes a giant modular protein composed of Ig and signal transduction domains

PubMed Central

1996-01-01

Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
New insights into the paleolake sequence of Baumkirchen (Austria): multiple lake phases and a minor ice advance during MIS 4?

NASA Astrophysics Data System (ADS)

Barrett, Samuel; Starnberger, Reinhard; Spötl, Christoph; Brauer, Achim; Tjallingii, Rik; Dulski, Peter; Abfalterer, Christof

2015-04-01

The sequence of pre-LGM lacustrine sediments at Baumkirchen (Austria) provides a key record in Alpine Quaternary stratigraphy. These sediments from within the boundary of the Alps potentially provide unique insights into the regional paleoclimate. Recent drilling revealed at least ~250m (the base was not reached) of almost entirely mm- to cm-scale lacustrine sediments. The laminated sediments are comprised of alternations between clayey silt and event layers of medium silt to fine sand. The sequence is interrupted only by a short section of gravel supported in an unlaminated clay-rich matrix. Optically stimulated luminescence dating identifies two distinct sequences: the upper sequence spanning mid-late Marine Isotope Stage (MIS) 3 (~33 to ~45 ka BP), agreeing with existing calibrated radiocarbon ages, and the lower section dating to MIS 4 (~59 to ~73 ka BP). Whether the hiatus is an erosional unconformity, or if the sequences represent two separate lake phases is unclear. Although the precise location of the hiatus is hard to identify, the gravel-rich section lies at the very top of the lower sequence. Pebbles in these gravels are largely angular and contain a significant proportion of non-local, regional lithologies. Such gravels are absent in the remainder of the entire 250 m-thick sequence and hence suggest a unique event rather than e.g. an interfingering local delta gravel foresets with the basin sediments. The gravels are therefore likely to be ice-rafted debris from icebergs from nearby glaciers calving into the lake. This therefore represents the first sedimentological evidence of a MIS 4 ice advance in the Eastern Alps. X-ray fluorescence analysis (ITRAX core scanning) of event layers indicates a strong change in the geochemical composition from generally K, Zr and Ti-rich layers in the upper sequence to mainly Ca and/or Si-rich layers in the lower sequence. X-ray diffraction analysis shows the Ca and Si signals to be controlled by carbonate (both calcite and dolomite) and quartz, respectively. This suggests a change in dominant sediment source and may indicate a change in catchment or paleolake configuration, re-raising the long outstanding question of how the lake or lakes were dammed.

Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.

PubMed

Sheth, Bhavisha P; Thaker, Vrinda S

2015-10-01

Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Localization and characterization of X chromosome inversion breakpoints separating Drosophila mojavensis and Drosophila arizonae.

PubMed

Cirulli, Elizabeth T; Noor, Mohamed A F

2007-01-01

Ectopic exchange between transposable elements or other repetitive sequences along a chromosome can produce chromosomal inversions. As a result, genome sequence studies typically find sequence similarity between corresponding inversion breakpoint regions. Here, we identify and investigate the breakpoint regions of the X chromosome inversion distinguishing Drosophila mojavensis and Drosophila arizonae. We localize one inversion breakpoint to 13.7 kb and localize the other to a 1-Mb interval. Using this localization and assuming microsynteny between Drosophila melanogaster and D. arizonae, we pinpoint likely positions of the inversion breakpoints to windows of less than 3000 bp. These breakpoints define the size of the inversion to approximately 11 Mb. However, in contrast to many other studies, we fail to find significant sequence similarity between the 2 breakpoint regions. The localization of these inversion breakpoints will facilitate future genetic and molecular evolutionary studies in this species group, an emerging model system for ecological genetics.
Functional analysis of Pacific oyster (Crassostrea gigas) β-thymosin: Focus on antimicrobial activity.

PubMed

Nam, Bo-Hye; Seo, Jung-Kil; Lee, Min Jeong; Kim, Young-Ok; Kim, Dong-Gyun; An, Cheul Min; Park, Nam Gyu

2015-07-01

An antimicrobial peptide, ∼5 kDa in size, was isolated and purified in its active form from the mantle of the Pacific oyster Crassostrea gigas by C18 reversed-phase high-performance liquid chromatography. Matrix-assisted laser desorption ionisation time-of-flight analysis revealed 4656.4 Da of the purified and unreduced peptide. A comparison of the N-terminal amino acid sequence of oyster antimicrobial peptide with deduced amino acid sequences in our local expressed sequence tag (EST) database of C. gigas (unpublished data) revealed that the oyster antimicrobial peptide sequence entirely matched the deduced amino acid sequence of an EST clone (HM-8_A04), which was highly homologous with the β-thymosin of other species. The cDNA possessed a 126-bp open reading frame that encoded a protein of 41 amino acids. To confirm the antimicrobial activity of C. gigas β-thymosin, we overexpressed a recombinant β-thymosin (rcgTβ) using a pET22 expression plasmid in an Escherichia coli system. The antimicrobial activity of rcgTβ was evaluated and demonstrated using a bacterial growth inhibition test in both liquid and solid cultures. Copyright © 2015 Elsevier Ltd. All rights reserved.
Study of cnidarian-algal symbiosis in the "omics" age.

PubMed

Meyer, Eli; Weis, Virginia M

2012-08-01

The symbiotic associations between cnidarians and dinoflagellate algae (Symbiodinium) support productive and diverse ecosystems in coral reefs. Many aspects of this association, including the mechanistic basis of host-symbiont recognition and metabolic interaction, remain poorly understood. The first completed genome sequence for a symbiotic anthozoan is now available (the coral Acropora digitifera), and extensive expressed sequence tag resources are available for a variety of other symbiotic corals and anemones. These resources make it possible to profile gene expression, protein abundance, and protein localization associated with the symbiotic state. Here we review the history of "omics" studies of cnidarian-algal symbiosis and the current availability of sequence resources for corals and anemones, identifying genes putatively involved in symbiosis across 10 anthozoan species. The public availability of candidate symbiosis-associated genes leaves the field of cnidarian-algal symbiosis poised for in-depth comparative studies of sequence diversity and gene expression and for targeted functional studies of genes associated with symbiosis. Reviewing the progress to date suggests directions for future investigations of cnidarian-algal symbiosis that include (i) sequencing of Symbiodinium, (ii) proteomic analysis of the symbiosome membrane complex, (iii) glycomic analysis of Symbiodinium cell surfaces, and (iv) expression profiling of the gastrodermal cells hosting Symbiodinium.
Phylogenetic relations of humans and African apes from DNA sequences in the Psi eta-globin region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miyamoto, M.M.; Slightom, J.L.; Goodman, M.

Sequences from the upstream and downstream flanking DNA regions of the Psi eta-globin locus in Pan troglodytes (common chimpanzee), Gorilla gorilla (gorilla), and Pongo pygmaeus (orangutan, the closest living relative to Homo, Pan, and Gorilla) provided further data for evaluating the phylogenetic relations of humans and African apes. These newly sequenced orthologs (an additional 4.9 kilobase pairs (kbp) for each species) were combined with published Psi eta-gene sequences and then compared to the same orthologous stretch (a continuous 7.1-kbp region) available for humans. Phylogenetic analysis of these nucleotide sequences by the parsimony method indicated (i) that human and chimpanzee aremore » more closely related to each other than either is to gorilla and (ii) that the slowdown in the rate of sequence evolution evident in higher primates is especially pronounced in humans. These results indicate that features unique to African apes (but not to humans) are primitive and that even local molecular clocks should be applied with caution.« less
Unique nuclear localization of Nile tilapia (Oreochromis niloticus) Neu4 sialidase is regulated by nuclear transport receptor importin α/β.

PubMed

Honda, Akinobu; Chigwechokha, Petros Kingstone; Kamada-Futagami, Yuko; Komatsu, Masaharu; Shiozaki, Kazuhiro

2018-06-01

Sialidase catalyzes the removal of sialic acids from glycoconjugates. Different from Neu1 and Neu3 sialidases, Neu4 enzymatic properties such as substrate specificity and subcellular localization are not well-conserved among vertebrates. In fish only zebrafish and medaka neu4 genes have been cloned and their polypeptides have been characterized so far. Thus, characterization of Neu4 from other fish species is necessary to evaluate Neu4 physiological functions. Here, Nile tilapia was chosen for the characterization of Neu4 polypeptide considering that it is one of the major cultured fish all over the world and that its genomic sequences are now available. Coding DNA sequence of tilapia Neu4 was identified as 1,497 bp and its recombinant protein showed broad substrate specificity and optimal sialidase enzyme activity pH at 4.0. Neu4 activity was sustained even in neutral and alkali pH. Interestingly, immunofluorescence analysis revealed that major subcellular localization of tilapia Neu4 was nuclear, quite distinct from zebrafish (ER) and medaka Neu4 (lysosome). Bioinformatic analysis showed the existence of putative nuclear localization signal (NLS) in tilapia Neu4. In general, it is known that importin families bind to several proteins via NLS and transfer them into nucleus. Therefore, to determine the involvement of putative NLS in Neu4 nuclear localization, Neu4 mutant deleting NLS was constructed and expressed in cultured cells. As a result, NLS deletion significantly diminished the nuclear localization. Furthermore, treatment of importazole, interrupter of binding importin β and RanGTP, significantly suppressed Neu4 nuclear localization. In summary, tilapia Neu4 is a unique sialidase localized at nucleus and its transport system into nucleus is regulated by importin. Copyright © 2018 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.
The eukaryotic signal sequence, YGRL, targets the chlamydial inclusion

PubMed Central

Kabeiseman, Emily J.; Cichos, Kyle H.; Moore, Elizabeth R.

2014-01-01

Understanding how host proteins are targeted to pathogen-specified organelles, like the chlamydial inclusion, is fundamentally important to understanding the biogenesis of these unique subcellular compartments and how they maintain autonomy within the cell. Syntaxin 6, which localizes to the chlamydial inclusion, contains an YGRL signal sequence. The YGRL functions to return syntaxin 6 to the trans-Golgi from the plasma membrane, and deletion of the YGRL signal sequence from syntaxin 6 also prevents the protein from localizing to the chlamydial inclusion. YGRL is one of three YXXL (YGRL, YQRL, and YKGL) signal sequences which target proteins to the trans-Golgi. We designed various constructs of eukaryotic proteins to test the specificity and propensity of YXXL sequences to target the inclusion. The YGRL signal sequence redirects proteins (e.g., Tgn38, furin, syntaxin 4) that normally do not localize to the chlamydial inclusion. Further, the requirement of the YGRL signal sequence for syntaxin 6 localization to inclusions formed by different species of Chlamydia is conserved. These data indicate that there is an inherent property of the chlamydial inclusion, which allows it to recognize the YGRL signal sequence. To examine whether this “inherent property” was protein or lipid in nature, we asked if deletion of the YGRL signal sequence from syntaxin 6 altered the ability of the protein to interact with proteins or lipids. Deletion or alteration of the YGRL from syntaxin 6 does not appreciably impact syntaxin 6-protein interactions, but does decrease syntaxin 6-lipid interactions. Intriguingly, data also demonstrate that YKGL or YQRL can successfully substitute for YGRL in localization of syntaxin 6 to the chlamydial inclusion. Importantly and for the first time, we are establishing that a eukaryotic signal sequence targets the chlamydial inclusion. PMID:25309881
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

PubMed

Daily, Jeff

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
cDNA cloning of the human peroxisomal enoyl-CoA hydratase: 3-Hydroxyacyl-CoA dehydrogenase bifunctional enzyme and localization to chromosome 3q26. 3-3q28: A free left Alu arm is inserted in the 3[prime] noncoding region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoefler, G.; Forstner, M.; Hulla, W.

1994-01-01

Enoyl-CoA hydratase:3-hydroxyacyl-CoA dehydrogenase bifunctional enzyme is one of the four enzymes of the peroxisomal, [beta]-oxidation pathway. Here, the authors report the full-length human cDNA sequence and the localization of the corresponding gene on chromosome 3q26.3-3q28. The cDNA sequence spans 3779 nucleotides with an open reading frame of 2169 nucleotides. The tripeptide SKL at the carboxy terminus, known to serve as a peroxisomal targeting signal, is present. DNA sequence comparison of the coding region showed an 80% homology between human and rat bifunctional enzyme cDNA. The 3[prime] noncoding sequence contains 117 nucleotides homologous to an Alu repeat. Based on sequence comparison,more » they propose that these nucleotides are a free left Alu arm with 86% homology to the Alu-J family. RNA analysis shows one band with highest intensity in liver and kidney. This cDNA will allow in-depth studies of molecular defects in patients with defective peroxisomal bifunctional enzyme. Moreover, it will also provide a means for studying the regulation of peroxisomal [beta]-oxidation in humans. 33 refs., 5 figs.« less
SubCellProt: predicting protein subcellular localization using machine learning approaches.

PubMed

Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan

2009-01-01

High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

PubMed

Ming, De-Song; Chen, Qing-Qing; Chen, Xiao-Tin

2018-05-14

To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of β-lactamase [NPS β-lactamase (EC 3.5.2.6), β-lactamase class C, and a metal-dependent hydrolase of β-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.
Collaborative development for setup, execution, sharing and analytics of complex NMR experiments.

PubMed

Irvine, Alistair G; Slynko, Vadim; Nikolaev, Yaroslav; Senthamarai, Russell R P; Pervushin, Konstantin

2014-02-01

Factory settings of NMR pulse sequences are rarely ideal for every scenario in which they are utilised. The optimisation of NMR experiments has for many years been performed locally, with implementations often specific to an individual spectrometer. Furthermore, these optimised experiments are normally retained solely for the use of an individual laboratory, spectrometer or even single user. Here we introduce a web-based service that provides a database for the deposition, annotation and optimisation of NMR experiments. The application uses a Wiki environment to enable the collaborative development of pulse sequences. It also provides a flexible mechanism to automatically generate NMR experiments from deposited sequences. Multidimensional NMR experiments of proteins and other macromolecules consume significant resources, in terms of both spectrometer time and effort required to analyse the results. Systematic analysis of simulated experiments can enable optimal allocation of NMR resources for structural analysis of proteins. Our web-based application (http://nmrplus.org) provides all the necessary information, includes the auxiliaries (waveforms, decoupling sequences etc.), for analysis of experiments by accurate numerical simulation of multidimensional NMR experiments. The online database of the NMR experiments, together with a systematic evaluation of their sensitivity, provides a framework for selection of the most efficient pulse sequences. The development of such a framework provides a basis for the collaborative optimisation of pulse sequences by the NMR community, with the benefits of this collective effort being available to the whole community. Copyright © 2013 Elsevier Inc. All rights reserved.
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

PubMed

Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

2013-07-01

The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
Mycobacterium marinum infections in fish and humans in Israel.

PubMed

Ucko, M; Colorni, A

2005-02-01

Israeli Mycobacterium marinum isolates from humans and fish were compared by direct sequencing of the 16S rRNA and hsp65 genes, restriction mapping, and amplified fragment length polymorphism analysis. Significant molecular differences separated all clinical isolates from the piscine isolates, ruling out the local aquaculture industry as the source of human infections.
myPhyloDB: a local web-server and database for the storage and analysis of metagenomics data

USDA-ARS?s Scientific Manuscript database

The advent of next-generation sequencing has resulted in an explosion of metagenomics data associated with microbial communities from a variety of ecosystems. However, no database and/or analytical software is currently available that allows for archival and cross-study comparison of such data. my...
Comparative sequence analysis of Toxoplasma gondii reveals local genomic admixture drives concerted expansion and diversification of secreted pathogenesis determinants

USDA-ARS?s Scientific Manuscript database

Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To un...
Genetic diversity, virulence, and Meloidogyne incognita interactions of Fusarium oxysporum isolates causing cotton wilt in Georgia

USDA-ARS?s Scientific Manuscript database

Locally severe outbreaks of Fusarium wilt of cotton (Gossypium spp.) in South Georgia raised concerns about the genotypes of the causal pathogen, Fusarium oxysporum f. sp. vasinfectum. Vegetative complementation tests and DNA sequence analysis were used to determine genetic diversity among 492 F. ox...
A generative, probabilistic model of local protein structure.

PubMed

Boomsma, Wouter; Mardia, Kanti V; Taylor, Charles C; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas

2008-07-01

Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
Multi-scale symbolic transfer entropy analysis of EEG

NASA Astrophysics Data System (ADS)

Yao, Wenpo; Wang, Jun

2017-10-01

From both global and local perspectives, we symbolize two kinds of EEG and analyze their dynamic and asymmetrical information using multi-scale transfer entropy. Multi-scale process with scale factor from 1 to 199 and step size of 2 is applied to EEG of healthy people and epileptic patients, and then the permutation with embedding dimension of 3 and global approach are used to symbolize the sequences. The forward and reverse symbol sequences are taken as the inputs of transfer entropy. Scale factor intervals of permutation and global way are (37, 57) and (65, 85) where the two kinds of EEG have satisfied entropy distinctions. When scale factor is 67, transfer entropy of the healthy and epileptic subjects of permutation, 0.1137 and 0.1028, have biggest difference. And the corresponding values of the global symbolization is 0.0641 and 0.0601 which lies in the scale factor of 165. Research results show that permutation which takes contribution of local information has better distinction and is more effectively applied to our multi-scale transfer entropy analysis of EEG.
Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

PubMed

Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

2011-12-01

Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences.

PubMed

An, Ji-Yong; Meng, Fan-Rong; You, Zhu-Hong; Fang, Yu-Hong; Zhao, Yu-Jun; Zhang, Ming

2016-01-01

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
A comparative study of ancient environmental DNA to pollen and macrofossils from lake sediments reveals taxonomic overlap and additional plant taxa

NASA Astrophysics Data System (ADS)

Pedersen, Mikkel Winther; Ginolhac, Aurélien; Orlando, Ludovic; Olsen, Jesper; Andersen, Kenneth; Holm, Jakob; Funder, Svend; Willerslev, Eske; Kjær, Kurt H.

2013-09-01

We use 2nd generation sequencing technology on sedimentary ancient DNA (sedaDNA) from a lake in South Greenland to reconstruct the local floristic history around a low-arctic lake and compare the results with those previously obtained from pollen and macrofossils in the same lake. Thirty-eight of thirty-nine samples from the core yielded putative DNA sequences. Using a multiple assignment strategy on the trnL g-h DNA barcode, consisting of two different phylogenetic and one sequence similarity assignment approaches, thirteen families of plants were identified, of which two (Scrophulariaceae and Asparagaceae) are absent from the pollen and macrofossil records. An age model for the sediment based on twelve radiocarbon dates establishes a chronology and shows that the lake record dates back to 10,650 cal yr BP. Our results suggest that sedaDNA analysis from lake sediments, although taxonomically less detailed than pollen and macrofossil analyses can be a complementary tool for establishing the composition of both terrestrial and aquatic local plant communities and a method for identifying additional taxa.
deepTools: a flexible platform for exploring deep-sequencing data.

PubMed

Ramírez, Fidel; Dündar, Friederike; Diehl, Sarah; Grüning, Björn A; Manke, Thomas

2014-07-01

We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at http://deeptools.ie-freiburg.mpg.de and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Population structure of the large Japanese field mouse, Apodemus speciosus (Rodentia: Muridae), in suburban landscape, based on mitochondrial D-loop sequences.

PubMed

Hirota, Tadao; Hirohata, Tetsuo; Mashima, Hiroshi; Satoh, Toshiyuki; Obara, Yoshiaki

2004-11-01

Genetic structure of the large Japanese field mouse populations in suburban landscape of West Tokyo, Japan was determined using mitochondrial DNA control region sequence. Samples were collected from six habitats linked by forests and green tract along the Tama River, and from two forests segregated by urban areas from those continuous habitats. Thirty-five haplotypes were detected in 221 animals. Four to eight haplotypes were found within each local population belonging to the continuous landscape. Some haplotypes were shared by two or three adjacent local populations. On the other hand, two isolated habitats were occupied by one or two indigenous haplotypes. Significant genetic differentiation between all pairs of local populations, except for one pair in the continuous habitats, was found by analysis of molecular variance (amova). The geographical distance between habitats did not explain the large variance of pairwise F(ST)-values among local populations. F(ST)-values between local populations segregated by urban areas were higher than those between local populations in the continuous habitat, regardless of geographical distance. The results of this study demonstrated quantitatively that urban areas inhibit the migration of Apodemus speciosus, whereas a linear green tract along a river functions as a corridor. Moreover, it preserves the metapopulation structure of A. speciosus as well as the corridors in suburban landscape.
Local/non-local regularized image segmentation using graph-cuts: application to dynamic and multispectral MRI.

PubMed

Hanson, Erik A; Lundervold, Arvid

2013-11-01

Multispectral, multichannel, or time series image segmentation is important for image analysis in a wide range of applications. Regularization of the segmentation is commonly performed using local image information causing the segmented image to be locally smooth or piecewise constant. A new spatial regularization method, incorporating non-local information, was developed and tested. Our spatial regularization method applies to feature space classification in multichannel images such as color images and MR image sequences. The spatial regularization involves local edge properties, region boundary minimization, as well as non-local similarities. The method is implemented in a discrete graph-cut setting allowing fast computations. The method was tested on multidimensional MRI recordings from human kidney and brain in addition to simulated MRI volumes. The proposed method successfully segment regions with both smooth and complex non-smooth shapes with a minimum of user interaction.
Nodal domains of a non-separable problem—the right-angled isosceles triangle

NASA Astrophysics Data System (ADS)

Aronovitch, Amit; Band, Ram; Fajman, David; Gnutzmann, Sven

2012-03-01

We study the nodal set of eigenfunctions of the Laplace operator on the right-angled isosceles triangle. A local analysis of the nodal pattern provides an algorithm for computing the number νn of nodal domains for any eigenfunction. In addition, an exact recursive formula for the number of nodal domains is found to reproduce all existing data. Eventually, we use the recursion formula to analyse a large sequence of nodal counts statistically. Our analysis shows that the distribution of nodal counts for this triangular shape has a much richer structure than the known cases of regular separable shapes or completely irregular shapes. Furthermore, we demonstrate that the nodal count sequence contains information about the periodic orbits of the corresponding classical ray dynamics.
Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification

PubMed Central

Faye, Laura L.; Machiela, Mitchell J.; Kraft, Peter; Bull, Shelley B.; Sun, Lei

2013-01-01

Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website. PMID:23950724
The 2009 L'Aquila earthquake sequence: technical and scientific activities during the emergency and post-emergency phases

NASA Astrophysics Data System (ADS)

Cardinali, Mauro

2010-05-01

The Central Apennines of Italy is an area characterized by significant seismic activity. In this area, individual earthquakes and prolonged seismic sequences produce a variety of ground effects, including landslides. The L'Aquila area, in the Abruzzo Region, was affected by an earthquake sequence that started on December 2008, and continued for several months. The main shock occurred on April 6, 2009, with local magnitude m = 6.3, and was followed by two separate earthquakes on April 7 and April 9, each with a local magnitude m > 5.0. The main shocks caused 308 fatalities, injured more than 1500 people, and left in excess of 65,000 people homeless. Damage to the cultural heritage was also severe, with tens of churches and historical buildings severely damaged or destroyed. The main shocks and some of the most severe aftershocks triggered landslides, chiefly rock falls and minor rock slides that caused damage to towns, individual houses, and the transportation network. Beginning in the immediate aftermath of the event, and continuing during the emergency and post-emergency phases, we assisted the Italian national Department for Civil Protection in the evaluation of local landslide and hydrological risk conditions. Technical and scientific activities focused on: (i) mapping the location, type, and severity of the main ground effects produced by the earthquake shaking, (ii) evaluating and selecting sites for potential new settlements and individual buildings, including a preliminary assessment of the local geomorphological and hydrological conditions; (iii) evaluating rock fall hazard at individual sites, (iv) monitoring slope and ground deformations, and (v) designing and implementing a prototype system for the forecast of the possible occurrence of rainfall-induced landslides. To execute these activates, we exploited a wide range of methods, techniques, and technologies, and we performed repeated field surveys, the interpretation of ground and aerial photographs taken at different times, the analysis and processing of optical and SAR satellite images, and the statistical analysis of rainfall measurements and quantitative weather forecasts.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Pestov, Nikolay B., E-mail: korn@mail.ibch.ru; Dmitriev, Ruslan I.; Kostina, Maria B.

Highlights: Black-Right-Pointing-Pointer Full-length secretory pathway Ca-ATPase (SPCA2) cloned from rat duodenum. Black-Right-Pointing-Pointer ATP2C2 gene (encoding SPCA2) exists only in genomes of Tetrapoda. Black-Right-Pointing-Pointer Rat and pig SPCA2 are expressed in intestines, lung and some secretory glands. Black-Right-Pointing-Pointer Subcellular localization of SPCA2 may depend on tissue type. Black-Right-Pointing-Pointer In rat duodenum, SPCA2 is localized in plasma membrane-associated compartments. -- Abstract: Secretory pathway Ca-ATPases are less characterized mammalian calcium pumps than plasma membrane Ca-ATPases and sarco-endoplasmic reticulum Ca-ATPases. Here we report analysis of molecular evolution, alternative splicing, tissue-specific expression and subcellular localization of the second isoform of the secretory pathway Ca-ATPase (SPCA2),more » the product of the ATP2C2 gene. The primary structure of SPCA2 from rat duodenum deduced from full-length transcript contains 944 amino acid residues, and exhibits 65% sequence identity with known SPCA1. The rat SPCA2 sequence is also highly homologous to putative human protein KIAA0703, however, the latter seems to have an aberrant N-terminus originating from intron 2. The tissue-specificity of SPCA2 expression is different from ubiquitous SPCA1. Rat SPCA2 transcripts were detected predominantly in gastrointestinal tract, lung, trachea, lactating mammary gland, skin and preputial gland. In the newborn pig, the expression profile is very similar with one remarkable exception: porcine bulbourethral gland gave the strongest signal. Upon overexpression in cultured cells, SPCA2 shows an intracellular distribution with remarkable enrichment in Golgi. However, in vivo SPCA2 may be localized in compartments that differ among various tissues: it is intracellular in epidermis, but enriched in plasma membranes of the intestinal epithelium. Analysis of SPCA2 sequences from various vertebrate species argue that ATP2C2 gene radiated from ATP2C1 (encoding SPCA1) during adaptation of tetrapod ancestors to terrestrial habitats.« less
Processing Dynamic Image Sequences from a Moving Sensor.

DTIC Science & Technology

1984-02-01

65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences

PubMed Central

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-01-01

Background The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. Methods RetroTector© (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. Results ROL was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10 000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. Discussion Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. Conclusion ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission. PMID:19534753
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

PubMed

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-06-16

The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of
Statistical analysis of native contact formation in the folding of designed model proteins

NASA Astrophysics Data System (ADS)

Tiana, Guido; Broglia, Ricardo A.

2001-02-01

The time evolution of the formation probability of native bonds has been studied for designed sequences which fold fast into the native conformation. From this analysis a clear hierarchy of bonds emerge: (a) local, fast forming highly stable native bonds built by some of the most strongly interacting amino acids of the protein; (b) nonlocal bonds formed late in the folding process, in coincidence with the folding nucleus, and involving essentially the same strongly interacting amino acids already participating in the fast bonds; (c) the rest of the native bonds whose behavior is subordinated, to a large extent, to that of the strong local and nonlocal native contacts.
Localization of the human tripeptidyl peptidase II gene (TPP2) to 13q32-q33 by nonradioactive in situ hybridization and somatic cell hybrids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martinsson, T.; Vujic, M.; Tomkinson, B.

1993-08-01

The authors have assigned the human tripeptidyl peptidase II (TPP2) gene to chromosome region 13q32-q33 using two different methods. First, a full-length TPP2 cDNA was used as a probe on Southern blots of DNA from a panel of human/rodent somatic cell hybrids. The TPP2 sequences were found to segregate with the human chromosome 13. Second, fluorescence in situ hybridization analysis was performed with the same probe. This analysis supported the chromosome 13 localization and further refined it to region 13q32-q33. 20 refs., 2 figs.
RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

PubMed Central

Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

2015-01-01

Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423
SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

2002-01-01

Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less
Current use of imaging and electromagnetic source localization procedures in epilepsy surgery centers across Europe.

PubMed

Mouthaan, Brian E; Rados, Matea; Barsi, Péter; Boon, Paul; Carmichael, David W; Carrette, Evelien; Craiu, Dana; Cross, J Helen; Diehl, Beate; Dimova, Petia; Fabo, Daniel; Francione, Stefano; Gaskin, Vladislav; Gil-Nagel, Antonio; Grigoreva, Elena; Guekht, Alla; Hirsch, Edouard; Hecimovic, Hrvoje; Helmstaedter, Christoph; Jung, Julien; Kalviainen, Reetta; Kelemen, Anna; Kimiskidis, Vasilios; Kobulashvili, Teia; Krsek, Pavel; Kuchukhidze, Giorgi; Larsson, Pål G; Leitinger, Markus; Lossius, Morten I; Luzin, Roman; Malmgren, Kristina; Mameniskiene, Ruta; Marusic, Petr; Metin, Baris; Özkara, Cigdem; Pecina, Hrvoje; Quesada, Carlos M; Rugg-Gunn, Fergus; Rydenhag, Bertil; Ryvlin, Philippe; Scholly, Julia; Seeck, Margitta; Staack, Anke M; Steinhoff, Bernhard J; Stepanov, Valentin; Tarta-Arsene, Oana; Trinka, Eugen; Uzan, Mustafa; Vogt, Viola L; Vos, Sjoerd B; Vulliémoz, Serge; Huiskamp, Geertjan; Leijten, Frans S S; Van Eijsden, Pieter; Braun, Kees P J

2016-05-01

In 2014 the European Union-funded E-PILEPSY project was launched to improve awareness of, and accessibility to, epilepsy surgery across Europe. We aimed to investigate the current use of neuroimaging, electromagnetic source localization, and imaging postprocessing procedures in participating centers. A survey on the clinical use of imaging, electromagnetic source localization, and postprocessing methods in epilepsy surgery candidates was distributed among the 25 centers of the consortium. A descriptive analysis was performed, and results were compared to existing guidelines and recommendations. Response rate was 96%. Standard epilepsy magnetic resonance imaging (MRI) protocols are acquired at 3 Tesla by 15 centers and at 1.5 Tesla by 9 centers. Three centers perform 3T MRI only if indicated. Twenty-six different MRI sequences were reported. Six centers follow all guideline-recommended MRI sequences with the proposed slice orientation and slice thickness or voxel size. Additional sequences are used by 22 centers. MRI postprocessing methods are used in 16 centers. Interictal positron emission tomography (PET) is available in 22 centers; all using 18F-fluorodeoxyglucose (FDG). Seventeen centers perform PET postprocessing. Single-photon emission computed tomography (SPECT) is used by 19 centers, of which 15 perform postprocessing. Four centers perform neither PET nor SPECT in children. Seven centers apply magnetoencephalography (MEG) source localization, and nine apply electroencephalography (EEG) source localization. Fourteen combinations of inverse methods and volume conduction models are used. We report a large variation in the presurgical diagnostic workup among epilepsy surgery centers across Europe. This diversity underscores the need for high-quality systematic reviews, evidence-based recommendations, and harmonization of available diagnostic presurgical methods. Wiley Periodicals, Inc. © 2016 International League Against Epilepsy.
Analysis of Genetic Variation and Phylogeny of the Predatory Bug, Pilophorus typicus, in Japan using Mitochondrial Gene Sequences

PubMed Central

Ito, Katsura; Nishikawa, Hiroshi; Shimada, Takuji; Ogawa, Kohei; Minamiya, Yukio; Tomoda, Masafumi; Nakahira, Kengo; Kodama, Rika; Fukuda, Tatsuya; Arakawa, Ryo

2011-01-01

Pilophorus typicus (Distant) (Heteroptera: Miridae) is a predatory bug occurring in East, Southeast, and South Asia. Because the active stages of P. typicus prey on various agricultural pest insects and mites, this species is a candidate insect as an indigenous natural enemy for use in biological control programs. However, the mass releasing of introduced natural enemies into agricultural fields may incur the risk of affecting the genetic integrity of species through hybridization with a local population. To clarify the genetic characteristics of the Japanese populations of P. typicus two portions of the mitochondrial DNA, the cytochrome oxidase subunit I (COI) (534 bp) and the cytochrome B (cytB) (217 bp) genes, were sequenced for 64 individuals collected from 55 localities in a wide range of Japan. Totals of 18 and 10 haplotypes were identified for the COI and cytB sequences, respectively (25 haplotypes over regions). Phylogenetic analysis using the maximum likelihood method revealed the existence of two genetically distinct groups in P. typicus in Japan. These groups were distributed in different geographic ranges: one occurred mainly from the Pacific coastal areas of the Kii Peninsula, the Shikoku Island, and the Ryukyu Islands; whereas the other occurred from the northern Kyushu district to the Kanto and Hokuriku districts of mainland Japan. However, both haplotypes were found in a single locality of the southern coast of the Shikoku Island. COI phylogeny incorporating other Pilophorus species revealed that these groups were only recently differentiated. Therefore, use of a certain population of P. typicus across its distribution range should be done with caution because genetic hybridization may occur. PMID:21526929
Full-genome sequence and analysis of a novel human rhinovirus strain within a divergent HRV-A clade.

PubMed

Rathe, Jennifer A; Liu, Xinyue; Tallon, Luke J; Gern, James E; Liggett, Stephen B

2010-01-01

Genome sequences of human rhinoviruses (HRV) have primarily been from stocks collected in the 1960s, with genomes and phylogeny of modern HRVs remaining undefined. Here, two modern isolates (hrv-A101 and hrv-A101-v1) collected approximately 8 years apart were sequenced in their entirety. Incorporation into our full-genome HRV alignment with subsequent phylogenetic network inference indicated that these represent a unique HRV-A, localized within a distinct divergent clade. They appear to have resulted from recombination of the hrv-65 and hrv-78 lineages. These results support our contention that there are unrecognized distinct HRV-A strains, and that recombination is evident in currently circulating strains.
A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

PubMed Central

Eddy, Sean R.

2008-01-01

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. PMID:18516236

Towards the Rational Design of a Candidate Vaccine against Pregnancy Associated Malaria: Conserved Sequences of the DBL6ε Domain of VAR2CSA

PubMed Central

Badaut, Cyril; Bertin, Gwladys; Rustico, Tatiana; Fievet, Nadine; Massougbodji, Achille; Gaye, Alioune; Deloron, Philippe

2010-01-01

Background Placental malaria is a disease linked to the sequestration of Plasmodium falciparum infected red blood cells (IRBC) in the placenta, leading to reduced materno-fetal exchanges and to local inflammation. One of the virulence factors of P. falciparum involved in cytoadherence to chondroitin sulfate A, its placental receptor, is the adhesive protein VAR2CSA. Its localisation on the surface of IRBC makes it accessible to the immune system. VAR2CSA contains six DBL domains. The DBL6ε domain is the most variable. High variability constitutes a means for the parasite to evade the host immune response. The DBL6ε domain could constitute a very attractive basis for a vaccine candidate but its reported variability necessitates, for antigenic characterisations, identifying and classifying commonalities across isolates. Methodology/Principal Findings Local alignment analysis of the DBL6ε domain had revealed that it is not as variable as previously described. Variability is concentrated in seven regions present on the surface of the DBL6ε domain. The main goal of our work is to classify and group variable sequences that will simplify further research to determine dominant epitopes. Firstly, variable sequences were grouped following their average percent pairwise identity (APPI). Groups comprising many variable sequences sharing low variability were found. Secondly, ELISA experiments following the IgG recognition of a recombinant DBL6ε domain, and of peptides mimicking its seven variable blocks, allowed to determine an APPI cut-off and to isolate groups represented by a single consensus sequence. Conclusions/Significance A new sequence approach is used to compare variable regions in sequences that have extensive segmental gene relationship. Using this approach, the VAR2CSA DBL6 domain is composed of 7 variable blocks with limited polymorphism. Each variable block is composed of a limited number of consensus types. Based on peptide based ELISA, variable blocks with 85% or greater sequence identity are expected to be recognized equally well by antibody and can be considered the same consensus type. Therefore, the analysis of the antibody response against the classified small number of sequences should be helpful to determine epitopes. PMID:20585655
Minimizing the average distance to a closest leaf in a phylogenetic tree.

PubMed

Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

2013-11-01

When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.
Localization, structure and polymorphism of two paralogous Xenopus laevis mitochondrial malate dehydrogenase genes.

PubMed

Tlapakova, Tereza; Krylov, Vladimir; Macha, Jaroslav

2005-01-01

Two paralogous mitochondrial malate dehydrogenase 2 (Mdh2) genes of Xenopus laevis have been cloned and sequenced, revealing 95% identity. Fluorescence in-situ hybridization (FISH) combined with tyramide amplification discriminates both genes; Mdh2a was localized into chromosome q3 and Mdh2b into chromosome q8. One kb cDNA probes detect both genes with 85% accuracy. The remaining signals were on the paralogous counterpart. Introns interrupt coding sequences at the same nucleotide as defined for mouse. Restriction polymorphism has been detected in the first intron of Mdh2a, while the individual variability in intron 6 of Mdh2b gene is represented by an insertion of incomplete retrotransposon L1Xl. Rates of nucleotide substitutions indicate that both genes are under similar evolutionary constraints. X. laevis Mdh2 genes can be used as markers for physical mapping and linkage analysis.
Local Outbreak of Listeria monocytogenes Serotype 4b Sequence Type 6 Due to Contaminated Meat Pâté.

PubMed

Althaus, Denise; Jermini, Marco; Giannini, Petra; Martinetti, Gladys; Reinholz, Danuta; Nüesch-Inderbinen, Magdalena; Lehner, Angelika; Stephan, Roger

2017-04-01

In January and February 2016, five cases of confirmed and two cases of probable infection due to Listeria monocytogenes serotype 4b, sequence type (ST) 6 belonging to a single pulsed-field gel electrophoresis pulsotype pattern were registered in a region of southern Switzerland. L. monocytogenes was detected in blood samples (four cases) and pleural fluid (one case). Furthermore, L. monocytogenes 4b ST6 was detected in a stool sample of an asymptomatic person exposed to a common food. Forthwith, the food safety authority and a local gourmet meat producer reported L. monocytogenes contamination of meat pâté. Analysis of further food and environmental samples from the premises of the producer yielded isolates matching the clinical strains and confirmed the presence of L. monocytogenes 4b ST6 in the mincing machine as the cause of the food contamination.
A Phylogenetic and Phenotypic Analysis of Salmonella enterica Serovar Weltevreden, an Emerging Agent of Diarrheal Disease in Tropical Regions

PubMed Central

Makendi, Carine; Page, Andrew J.; Wren, Brendan W.; Le Thi Phuong, Tu; Clare, Simon; Hale, Christine; Goulding, David; Klemm, Elizabeth J.; Pickard, Derek; Okoro, Chinyere; Hunt, Martin; Thompson, Corinne N.; Phu Huong Lan, Nguyen; Tran Do Hoang, Nhu; Thwaites, Guy E.; Le Hello, Simon; Brisabois, Anne; Weill, François-Xavier; Baker, Stephen; Dougan, Gordon

2016-01-01

Salmonella enterica serovar Weltevreden (S. Weltevreden) is an emerging cause of diarrheal and invasive disease in humans residing in tropical regions. Despite the regional and international emergence of this Salmonella serovar, relatively little is known about its genetic diversity, genomics or virulence potential in model systems. Here we used whole genome sequencing and bioinformatics analyses to define the phylogenetic structure of a diverse global selection of S. Weltevreden. Phylogenetic analysis of more than 100 isolates demonstrated that the population of S. Weltevreden can be segregated into two main phylogenetic clusters, one associated predominantly with continental Southeast Asia and the other more internationally dispersed. Subcluster analysis suggested the local evolution of S. Weltevreden within specific geographical regions. Four of the isolates were sequenced using long read sequencing to produce high quality reference genomes. Phenotypic analysis in Hep-2 cells and in a murine infection model indicated that S. Weltevreden were significantly attenuated in these models compared to the classical S. Typhimurium reference strain SL1344. Our work outlines novel insights into this important emerging pathogen and provides a baseline understanding for future research studies. PMID:26867150
Global and Local Pitch Perception in Children with Developmental Dyslexia

ERIC Educational Resources Information Center

Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Foxton, Jessica M.

2012-01-01

This study investigated global versus local pitch pattern perception in children with dyslexia aged between 8 and 11 years. Children listened to two consecutive 4-tone pitch sequences while performing a same/different task. On the different trials, sequences either preserved the contour (local condition) or they violated the contour (global…
TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data

PubMed Central

Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie

2018-01-01

Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630
ampliMethProfiler: a pipeline for the analysis of CpG methylation profiles of targeted deep bisulfite sequenced amplicons.

PubMed

Scala, Giovanni; Affinito, Ornella; Palumbo, Domenico; Florio, Ermanno; Monticelli, Antonella; Miele, Gennaro; Chiariotti, Lorenzo; Cocozza, Sergio

2016-11-25

CpG sites in an individual molecule may exist in a binary state (methylated or unmethylated) and each individual DNA molecule, containing a certain number of CpGs, is a combination of these states defining an epihaplotype. Classic quantification based approaches to study DNA methylation are intrinsically unable to fully represent the complexity of the underlying methylation substrate. Epihaplotype based approaches, on the other hand, allow methylation profiles of cell populations to be studied at the single molecule level. For such investigations, next-generation sequencing techniques can be used, both for quantitative and for epihaplotype analysis. Currently available tools for methylation analysis lack output formats that explicitly report CpG methylation profiles at the single molecule level and that have suited statistical tools for their interpretation. Here we present ampliMethProfiler, a python-based pipeline for the extraction and statistical epihaplotype analysis of amplicons from targeted deep bisulfite sequencing of multiple DNA regions. ampliMethProfiler tool provides an easy and user friendly way to extract and analyze the epihaplotype composition of reads from targeted bisulfite sequencing experiments. ampliMethProfiler is written in python language and requires a local installation of BLAST and (optionally) QIIME tools. It can be run on Linux and OS X platforms. The software is open source and freely available at http://amplimethprofiler.sourceforge.net .
Cloning and expression of a nuclear encoded plastid specific 33 kDa ribonucleoprotein gene (33RNP) from pea that is light stimulated.

PubMed

Reddy, M K; Nair, S; Singh, B N; Mudgil, Y; Tewari, K K; Sopory, S K

2001-01-24

We report the cloning and sequencing of both cDNA and genomic DNA of a 33 kDa chloroplast ribonucleoprotein (33RNP) from pea. The analysis of the predicted amino acid sequence of the cDNA clone revealed that the encoded protein contains two RNA binding domains, including the conserved consensus ribonucleoprotein sequences CS-RNP1 and CS-RNP2, on the C-terminus half and the presence of a putative transit peptide sequence in the N-terminus region. The phylogenetic and multiple sequence alignment analysis of pea chloroplast RNP along with RNPs reported from the other plant sources revealed that the pea 33RNP is very closely related to Nicotiana sylvestris 31RNP and 28RNP and also to 31RNP and 28RNP of Arabidopsis and spinach, respectively. The pea 33RNP was expressed in Escherichia coli and purified to homogeneity. The in vitro import of precursor protein into chloroplasts confirmed that the N-terminus putative transit peptide is a bona fide transit peptide and 33RNP is localized in the chloroplast. The nucleic acid-binding properties of the recombinant protein, as revealed by South-Western analysis, showed that 33RNP has higher binding affinity for poly (U) and oligo dT than for ssDNA and dsDNA. The steady state transcript level was higher in leaves than in roots and the expression of this gene is light stimulated. Sequence analysis of the genomic clone revealed that the gene contains four exons and three introns. We have also isolated and analyzed the 5' flanking region of the pea 33RNP gene.
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA.

PubMed

Wang, Shunfang; Liu, Shuhui

2015-12-19

An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.
Walleye dermal sarcoma virus Orf B functions through receptor for activated C kinase (RACK1) and protein kinase C

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daniels, Candelaria C.; Rovnak, Joel; Quackenbush, Sandra L.

2008-06-05

Walleye dermal sarcoma virus is a complex retrovirus that is associated with walleye dermal sarcomas that are seasonal in nature. Fall developing tumors contain low levels of spliced accessory gene transcripts A and B, suggesting a role for the encoded proteins, Orf A and Orf B, in oncogenesis. In explanted tumor cells the 35 kDa Orf B accessory protein is localized to the cell periphery in structures similar to focal adhesions and along actin stress fibers. Similar localization was observed in mammalian cells. The cellular protein, receptor for activated C kinase 1 (RACK1), bound Orf B in yeast two-hybrid assaysmore » and in cell culture. Sequence analysis of walleye RACK1 demonstrated high conservation to other known RACK1 sequences. RACK1 binds to activated protein kinase C (PKC). Orf B associates with PKC{alpha}, which is constitutively activated and localized at the membrane. Activated PKC promoted cell survival, proliferation, and increased cell viability in Orf B-expressing cells.« less
Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA

PubMed Central

Wang, Shunfang; Liu, Shuhui

2015-01-01

An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one. PMID:26703574
Assignment of the human PAX4 gene to chromosome band 7q32 by fluorescence in situ hybridization.

PubMed

Tamura, T; Izumikawa, Y; Kishino, T; Soejima, H; Jinno, Y; Niikawa, N

1994-01-01

Of the nine known members of a human paired box-containing gene family (Pax), only PAX4 has not been precisely localized. We screened a cosmid library of human genomic DNA using polymerase chain reaction products for PAX4 as a probe and isolated three positive cosmid clones. Sequence analysis revealed that at least two of them had exon-like sequences and showed extensive homology to Pax-4 in the mouse. These two cosmid clones were mapped to human chromosome band 7q32 by fluorescence in situ hybridization.
RNAbrowse: RNA-Seq de novo assembly results browser.

PubMed

Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe

2014-01-01

Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse.
Facile rhenium-peptide conjugate synthesis using a one-pot derived Re(CO)3 reagent.

PubMed

Chanawanno, Kullapa; Kondeti, Vinay; Caporoso, Joel; Paruchuri, Sailaja; Leeper, Thomas C; Herrick, Richard S; Ziegler, Christopher J

2016-03-21

We have synthesized two Re(CO)3-modified lysine complexes (1 and 2), where the metal is attached to the amino acid at the Nε position, via a one-pot Schiff base formation reaction. These compounds can be used in the solid phase synthesis of peptides, and to date we have produced four conjugate systems incorporating neurotensin, bombesin, leutenizing hormone releasing hormone, and a nuclear localization sequence. We observed uptake into human umbilical vascular endothelial cells as well as differential uptake depending on peptide sequence identity, as characterized by fluorescence and rhenium elemental analysis.
A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

PubMed Central

Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L.; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C.; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier

2015-01-01

Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598
Identification and functional characterization of a novel bipartite nuclear localization sequence in ARID1A

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bateman, Nicholas W.; The John P. Murtha Cancer Center, Walter Reed National Military Medical Center, 8901 Wisconsin Avenue, Bethesda 20889, MD; Shoji, Yutaka

2016-01-01

AT-rich interactive domain-containing protein 1A (ARID1A) is a recently identified nuclear tumor suppressor frequently altered in solid tumor malignancies. We have identified a bipartite-like nuclear localization sequence (NLS) that contributes to nuclear import of ARID1A not previously described. We functionally confirm activity using GFP constructs fused with wild-type or mutant NLS sequences. We further show that cyto-nuclear localized, bipartite NLS mutant ARID1A exhibits greater stability than nuclear-localized, wild-type ARID1A. Identification of this undescribed functional NLS within ARID1A contributes vital insights to rationalize the impact of ARID1A missense mutations observed in patient tumors. - Highlights: • We have identified a bipartitemore » nuclear localization sequence (NLS) in ARID1A. • Confirmation of the NLS was performed using GFP constructs. • NLS mutant ARID1A exhibits greater stability than wild-type ARID1A.« less
Music-Elicited Emotion Identification Using Optical Flow Analysis of Human Face

NASA Astrophysics Data System (ADS)

Kniaz, V. V.; Smirnova, Z. N.

2015-05-01

Human emotion identification from image sequences is highly demanded nowadays. The range of possible applications can vary from an automatic smile shutter function of consumer grade digital cameras to Biofied Building technologies, which enables communication between building space and residents. The highly perceptual nature of human emotions leads to the complexity of their classification and identification. The main question arises from the subjective quality of emotional classification of events that elicit human emotions. A variety of methods for formal classification of emotions were developed in musical psychology. This work is focused on identification of human emotions evoked by musical pieces using human face tracking and optical flow analysis. Facial feature tracking algorithm used for facial feature speed and position estimation is presented. Facial features were extracted from each image sequence using human face tracking with local binary patterns (LBP) features. Accurate relative speeds of facial features were estimated using optical flow analysis. Obtained relative positions and speeds were used as the output facial emotion vector. The algorithm was tested using original software and recorded image sequences. The proposed technique proves to give a robust identification of human emotions elicited by musical pieces. The estimated models could be used for human emotion identification from image sequences in such fields as emotion based musical background or mood dependent radio.
Cloning and sequence analysis of the meso-diaminopimelate decarboxylase gene from Bacillus methanolicus MGA3 and comparison to other decarboxylase genes.

PubMed Central

Mills, D A; Flickinger, M C

1993-01-01

The lysA gene of Bacillus methanolicus MGA3 was cloned by complementation of an auxotrophic Escherichia coli lysA22 mutant with a genomic library of B. methanolicus MGA3 chromosomal DNA. Subcloning localized the B. methanolicus MGA3 lysA gene into a 2.3-kb SmaI-SstI fragment. Sequence analysis of the 2.3-kb fragment indicated an open reading frame encoding a protein of 48,223 Da, which was similar to the meso-diaminopimelate (DAP) decarboxylase amino acid sequences of Bacillus subtilis (62%) and Corynebacterium glutamicum (40%). Amino acid sequence analysis indicated several regions of conservation among bacterial DAP decarboxylases, eukaryotic ornithine decarboxylases, and arginine decarboxylases, suggesting a common structural arrangement for positioning of substrate and the cofactor pyridoxal 5'-phosphate. The B. methanolicus MGA3 DAP decarboxylase was shown to be a dimer (M(r) 86,000) with a subunit molecular mass of approximately 50,000 Da. This decarboxylase is inhibited by lysine (Ki = 0.93 mM) with a Km of 0.8 mM for DAP. The inhibition pattern suggests that the activity of this enzyme in lysine-overproducing strains of B. methanolicus MGA3 may limit lysine synthesis. Images PMID:8215365
Cloning and sequence analysis of the meso-diaminopimelate decarboxylase gene from Bacillus methanolicus MGA3 and comparison to other decarboxylase genes.

PubMed

Mills, D A; Flickinger, M C

1993-09-01

The lysA gene of Bacillus methanolicus MGA3 was cloned by complementation of an auxotrophic Escherichia coli lysA22 mutant with a genomic library of B. methanolicus MGA3 chromosomal DNA. Subcloning localized the B. methanolicus MGA3 lysA gene into a 2.3-kb SmaI-SstI fragment. Sequence analysis of the 2.3-kb fragment indicated an open reading frame encoding a protein of 48,223 Da, which was similar to the meso-diaminopimelate (DAP) decarboxylase amino acid sequences of Bacillus subtilis (62%) and Corynebacterium glutamicum (40%). Amino acid sequence analysis indicated several regions of conservation among bacterial DAP decarboxylases, eukaryotic ornithine decarboxylases, and arginine decarboxylases, suggesting a common structural arrangement for positioning of substrate and the cofactor pyridoxal 5'-phosphate. The B. methanolicus MGA3 DAP decarboxylase was shown to be a dimer (M(r) 86,000) with a subunit molecular mass of approximately 50,000 Da. This decarboxylase is inhibited by lysine (Ki = 0.93 mM) with a Km of 0.8 mM for DAP. The inhibition pattern suggests that the activity of this enzyme in lysine-overproducing strains of B. methanolicus MGA3 may limit lysine synthesis.

Small-target leak detection for a closed vessel via infrared image sequences

NASA Astrophysics Data System (ADS)

Zhao, Ling; Yang, Hongjiu

2017-03-01

This paper focus on a leak diagnosis and localization method based on infrared image sequences. Some problems on high probability of false warning and negative affect for marginal information are solved by leak detection. An experimental model is established for leak diagnosis and localization on infrared image sequences. The differential background prediction is presented to eliminate the negative affect of marginal information on test vessel based on a kernel regression method. A pipeline filter based on layering voting is designed to reduce probability of leak point false warning. A synthesize leak diagnosis and localization algorithm is proposed based on infrared image sequences. The effectiveness and potential are shown for developed techniques through experimental results.
A Robust Crowdsourcing-Based Indoor Localization System.

PubMed

Zhou, Baoding; Li, Qingquan; Mao, Qingzhou; Tu, Wei

2017-04-14

WiFi fingerprinting-based indoor localization has been widely used due to its simplicity and can be implemented on the smartphones. The major drawback of WiFi fingerprinting is that the radio map construction is very labor-intensive and time-consuming. Another drawback of WiFi fingerprinting is the Received Signal Strength (RSS) variance problem, caused by environmental changes and device diversity. RSS variance severely degrades the localization accuracy. In this paper, we propose a robust crowdsourcing-based indoor localization system (RCILS). RCILS can automatically construct the radio map using crowdsourcing data collected by smartphones. RCILS abstracts the indoor map as the semantics graph in which the edges are the possible user paths and the vertexes are the location where users may take special activities. RCILS extracts the activity sequence contained in the trajectories by activity detection and pedestrian dead-reckoning. Based on the semantics graph and activity sequence, crowdsourcing trajectories can be located and a radio map is constructed based on the localization results. For the RSS variance problem, RCILS uses the trajectory fingerprint model for indoor localization. During online localization, RCILS obtains an RSS sequence and realizes localization by matching the RSS sequence with the radio map. To evaluate RCILS, we apply RCILS in an office building. Experiment results demonstrate the efficiency and robustness of RCILS.
A Robust Crowdsourcing-Based Indoor Localization System

PubMed Central

Zhou, Baoding; Li, Qingquan; Mao, Qingzhou; Tu, Wei

2017-01-01

WiFi fingerprinting-based indoor localization has been widely used due to its simplicity and can be implemented on the smartphones. The major drawback of WiFi fingerprinting is that the radio map construction is very labor-intensive and time-consuming. Another drawback of WiFi fingerprinting is the Received Signal Strength (RSS) variance problem, caused by environmental changes and device diversity. RSS variance severely degrades the localization accuracy. In this paper, we propose a robust crowdsourcing-based indoor localization system (RCILS). RCILS can automatically construct the radio map using crowdsourcing data collected by smartphones. RCILS abstracts the indoor map as the semantics graph in which the edges are the possible user paths and the vertexes are the location where users may take special activities. RCILS extracts the activity sequence contained in the trajectories by activity detection and pedestrian dead-reckoning. Based on the semantics graph and activity sequence, crowdsourcing trajectories can be located and a radio map is constructed based on the localization results. For the RSS variance problem, RCILS uses the trajectory fingerprint model for indoor localization. During online localization, RCILS obtains an RSS sequence and realizes localization by matching the RSS sequence with the radio map. To evaluate RCILS, we apply RCILS in an office building. Experiment results demonstrate the efficiency and robustness of RCILS. PMID:28420108
Mycobacterium marinum Infections in Fish and Humans in Israel

PubMed Central

Ucko, M.; Colorni, A.

2005-01-01

Israeli Mycobacterium marinum isolates from humans and fish were compared by direct sequencing of the 16S rRNA and hsp65 genes, restriction mapping, and amplified fragment length polymorphism analysis. Significant molecular differences separated all clinical isolates from the piscine isolates, ruling out the local aquaculture industry as the source of human infections. PMID:15695698
A space-efficient algorithm for local similarities.

PubMed

Huang, X Q; Hardison, R C; Miller, W

1990-10-01

Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.
Factoring local sequence composition in motif significance analysis.

PubMed

Ng, Patrick; Keich, Uri

2008-01-01

We recently introduced a biologically realistic and reliable significance analysis of the output of a popular class of motif finders. In this paper we further improve our significance analysis by incorporating local base composition information. Relying on realistic biological data simulation, as well as on FDR analysis applied to real data, we show that our method is significantly better than the increasingly popular practice of using the normal approximation to estimate the significance of a finder's output. Finally we turn to leveraging our reliable significance analysis to improve the actual motif finding task. Specifically, endowing a variant of the Gibbs Sampler with our improved significance analysis we demonstrate that de novo finders can perform better than has been perceived. Significantly, our new variant outperforms all the finders reviewed in a recently published comprehensive analysis of the Harbison genome-wide binding location data. Interestingly, many of these finders incorporate additional information such as nucleosome positioning and the significance of binding data.
DraGnET: Software for storing, managing and analyzing annotated draft genome sequence data

PubMed Central

2010-01-01

Background New "next generation" DNA sequencing technologies offer individual researchers the ability to rapidly generate large amounts of genome sequence data at dramatically reduced costs. As a result, a need has arisen for new software tools for storage, management and analysis of genome sequence data. Although bioinformatic tools are available for the analysis and management of genome sequences, limitations still remain. For example, restrictions on the submission of data and use of these tools may be imposed, thereby making them unsuitable for sequencing projects that need to remain in-house or proprietary during their initial stages. Furthermore, the availability and use of next generation sequencing in industrial, governmental and academic environments requires biologist to have access to computational support for the curation and analysis of the data generated; however, this type of support is not always immediately available. Results To address these limitations, we have developed DraGnET (Draft Genome Evaluation Tool). DraGnET is an open source web application which allows researchers, with no experience in programming and database management, to setup their own in-house projects for storing, retrieving, organizing and managing annotated draft and complete genome sequence data. The software provides a web interface for the use of BLAST, allowing users to perform preliminary comparative analysis among multiple genomes. We demonstrate the utility of DraGnET for performing comparative genomics on closely related bacterial strains. Furthermore, DraGnET can be further developed to incorporate additional tools for more sophisticated analyses. Conclusions DraGnET is designed for use either by individual researchers or as a collaborative tool available through Internet (or Intranet) deployment. For genome projects that require genome sequencing data to initially remain proprietary, DraGnET provides the means for researchers to keep their data in-house for analysis using local programs or until it is made publicly available, at which point it may be uploaded to additional analysis software applications. The DraGnET home page is available at http://www.dragnet.cvm.iastate.edu and includes example files for examining the functionalities, a link for downloading the DraGnET setup package and a link to the DraGnET source code hosted with full documentation on SourceForge. PMID:20175920
Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

PubMed Central

Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

2014-01-01

RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Spatiotemporal attention operator using isotropic contrast and regional homogeneity

NASA Astrophysics Data System (ADS)

Palenichka, Roman; Lakhssassi, Ahmed; Zaremba, Marek

2011-04-01

A multiscale operator for spatiotemporal isotropic attention is proposed to reliably extract attention points during image sequence analysis. Its consecutive local maxima indicate attention points as the centers of image fragments of variable size with high intensity contrast, region homogeneity, regional shape saliency, and temporal change presence. The scale-adaptive estimation of temporal change (motion) and its aggregation with the regional shape saliency contribute to the accurate determination of attention points in image sequences. Multilocation descriptors of an image sequence are extracted at the attention points in the form of a set of multidimensional descriptor vectors. A fast recursive implementation is also proposed to make the operator's computational complexity independent from the spatial scale size, which is the window size in the spatial averaging filter. Experiments on the accuracy of attention-point detection have proved the operator consistency and its high potential for multiscale feature extraction from image sequences.
PredictProtein—an open resource for online prediction of protein structural and functional features

PubMed Central

Yachdav, Guy; Kloppmann, Edda; Kajan, Laszlo; Hecht, Maximilian; Goldberg, Tatyana; Hamp, Tobias; Hönigschmid, Peter; Schafferhans, Andrea; Roos, Manfred; Bernhofer, Michael; Richter, Lothar; Ashkenazy, Haim; Punta, Marco; Schlessinger, Avner; Bromberg, Yana; Schneider, Reinhard; Vriend, Gerrit; Sander, Chris; Ben-Tal, Nir; Rost, Burkhard

2014-01-01

PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein–protein binding sites (ISIS2), protein–polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics. To this end, the PredictProtein results are presented as both text and a series of intuitive, interactive and visually appealing figures. The web server and sources are available at http://ppopen.rostlab.org. PMID:24799431
Genome-Wide Identification of Regulatory Sequences Undergoing Accelerated Evolution in the Human Genome.

PubMed

Dong, Xinran; Wang, Xiao; Zhang, Feng; Tian, Weidong

2016-10-01

Accelerated evolution of regulatory sequence can alter the expression pattern of target genes, and cause phenotypic changes. In this study, we used DNase I hypersensitive sites (DHSs) to annotate putative regulatory sequences in the human genome, and conducted a genome-wide analysis of the effects of accelerated evolution on regulatory sequences. Working under the assumption that local ancient repeat elements of DHSs are under neutral evolution, we discovered that ∼0.44% of DHSs are under accelerated evolution (ace-DHSs). We found that ace-DHSs tend to be more active than background DHSs, and are strongly associated with epigenetic marks of active transcription. The target genes of ace-DHSs are significantly enriched in neuron-related functions, and their expression levels are positively selected in the human brain. Thus, these lines of evidences strongly suggest that accelerated evolution on regulatory sequences plays important role in the evolution of human-specific phenotypes. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Finding similar nucleotide sequences using network BLAST searches.

PubMed

Ladunga, Istvan

2009-06-01

The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.
Linkage localization of X-linked Charcot-Marie-Tooth disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bergoffen, J.; Trofatter, J.; Haines, J.L.

1993-02-01

Charcot-Marie-Tooth disease (CMT), also known as hereditary motor and sensory neuropathy, is a heterogeneous group of slowly progressive, degenerative disorders of peripheral nerve. X-linked CMT (CMTX) (McKusick 302800), a subdivision of type I, or demyelinating, CMT is an X-linked dominant condition with variable penetrance. Previous linkage analysis using RFLPs demonstrated linkage to markers on the proximal long and short arms of the X chromosome, with the more likely localization on the proximal long arm of the X chromosome. Available variable simple-sequence repeats (VSSRs) broaden the possibilities for linkage analysis. This paper presents new linkage data and recombination analysis derived frommore » work with four VSSR markers - AR, PGKP1, DXS453, and DXYS1X - in addition to analysis using RFLP markers described elsewhere. These studies localize the CMTX gene to the proximal Xq segment between PGKP1 (Xq11.2-12) and DXS72 (Xq21.1), with a combined maximum multipoint lod score of 15.3 at DXS453 ([theta] = 0). 32 refs., 3 figs., 2 tabs.« less
Mapping a nucleolar targeting sequence of an RNA binding nucleolar protein, Nop25

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fujiwara, Takashi; Suzuki, Shunji; Kanno, Motoko

2006-06-10

Nop25 is a putative RNA binding nucleolar protein associated with rRNA transcription. The present study was undertaken to determine the mechanism of Nop25 localization in the nucleolus. Deletion experiments of Nop25 amino acid sequence showed Nop25 to contain a nuclear targeting sequence in the N-terminal and a nucleolar targeting sequence in the C-terminal. By expressing derivative peptides from the C-terminal as GFP-fusion proteins in the cells, a lysine and arginine residue-enriched peptide (KRKHPRRAQDSTKKPPSATRTSKTQRRRR) allowed a GFP-fusion protein to be transported and fully retained in the nucleolus. When the peptide was fused with cMyc epitope and expressed in the cells, amore » cMyc epitope was then detected in the nucleolus. Nop25 did not localize in the nucleolus by deletion of the peptide from Nop25. Furthermore, deletion of a subdomain (KRKHPRRAQ) in the peptide or amino acid substitution of lysine and arginine residues in the subdomain resulted in the loss of Nop25 nucleolar localization. These results suggest that the lysine and arginine residue-enriched peptide is the most prominent nucleolar targeting sequence of Nop25 and that the long stretch of basic residues might play an important role in the nucleolar localization of Nop25. Although Nop25 contained putative SUMOylation, phosphorylation and glycosylation sites, the amino acid substitution in these sites had no effect on the nucleolar localization, thus suggesting that these post-translational modifications did not contribute to the localization of Nop25 in the nucleolus. The treatment of the cells, which expressed a GFP-fusion protein with a nucleolar targeting sequence of Nop25, with RNase A resulted in a complete dislocation of the protein from the nucleolus. These data suggested that the nucleolar targeting sequence might therefore play an important role in the binding of Nop25 to RNA molecules and that the RNA binding of Nop25 might be essential for the nucleolar localization of Nop25.« less
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

PubMed Central

Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

2015-01-01

Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
A Comprehensive Approach to Sequence-oriented IsomiR annotation (CASMIR): demonstration with IsomiR profiling in colorectal neoplasia.

PubMed

Wu, Chung Wah; Evans, Jared M; Huang, Shengbing; Mahoney, Douglas W; Dukek, Brian A; Taylor, William R; Yab, Tracy C; Smyrk, Thomas C; Jen, Jin; Kisiel, John B; Ahlquist, David A

2018-05-25

MicroRNA (miRNA) profiling is an important step in studying biological associations and identifying marker candidates. miRNA exists in isoforms, called isomiRs, which may exhibit distinct properties. With conventional profiling methods, limitations in assay and analysis platforms may compromise isomiR interrogation. We introduce a comprehensive approach to sequence-oriented isomiR annotation (CASMIR) to allow unbiased identification of global isomiRs from small RNA sequencing data. In this approach, small RNA reads are maintained as independent sequences instead of being summarized under miRNA names. IsomiR features are identified through step-wise local alignment against canonical forms and precursor sequences. Through customizing the reference database, CASMIR is applicable to isomiR annotation across species. To demonstrate its application, we investigated isomiR profiles in normal and neoplastic human colorectal epithelia. We also ran miRDeep2, a popular miRNA analysis algorithm to validate isomiRs annotated by CASMIR. With CASMIR, specific and biologically relevant isomiR patterns could be identified. We note that specific isomiRs are often more abundant than their canonical forms. We identify isomiRs that are commonly up-regulated in both colorectal cancer and advanced adenoma, and illustrate advantages in targeting isomiRs as potential biomarkers over canonical forms. Studying miRNAs at the isomiR level could reveal new insight into miRNA biology and inform assay design for specific isomiRs. CASMIR facilitates comprehensive annotation of isomiR features in small RNA sequencing data for isomiR profiling and differential expression analysis.
Increased Sensitivity of Diagnostic Mutation Detection by Re-analysis Incorporating Local Reassembly of Sequence Reads.

PubMed

Watson, Christopher M; Camm, Nick; Crinnion, Laura A; Clokie, Samuel; Robinson, Rachel L; Adlard, Julian; Charlton, Ruth; Markham, Alexander F; Carr, Ian M; Bonthron, David T

2017-12-01

Diagnostic genetic testing programmes based on next-generation DNA sequencing have resulted in the accrual of large datasets of targeted raw sequence data. Most diagnostic laboratories process these data through an automated variant-calling pipeline. Validation of the chosen analytical methods typically depends on confirming the detection of known sequence variants. Despite improvements in short-read alignment methods, current pipelines are known to be comparatively poor at detecting large insertion/deletion mutations. We performed clinical validation of a local reassembly tool, ABRA (assembly-based realigner), through retrospective reanalysis of a cohort of more than 2000 hereditary cancer cases. ABRA enabled detection of a 96-bp deletion, 4-bp insertion mutation in PMS2 that had been initially identified using a comparative read-depth approach. We applied an updated pipeline incorporating ABRA to the entire cohort of 2000 cases and identified one previously undetected pathogenic variant, a 23-bp duplication in PTEN. We demonstrate the effect of read length on the ability to detect insertion/deletion variants by comparing HiSeq2500 (2 × 101-bp) and NextSeq500 (2 × 151-bp) sequence data for a range of variants and thereby show that the limitations of shorter read lengths can be mitigated using appropriate informatics tools. This work highlights the need for ongoing development of diagnostic pipelines to maximize test sensitivity. We also draw attention to the large differences in computational infrastructure required to perform day-to-day versus large-scale reprocessing tasks.
Organization, chromosomal localization and promoter analysis of the gene encoding human acidic fibroblast growth factor intracellular binding protein.

PubMed Central

Kolpakova, E; Frengen, E; Stokke, T; Olsnes, S

2000-01-01

Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that might be involved in the intracellular function of aFGF. Here we present a comparative analysis of the deduced amino acid sequences of human, murine and Drosophila FIBP analogues and demonstrate that FIBP is an evolutionarily conserved protein. The human gene spans more than 5 kb, comprising ten exons and nine introns, and maps to chromosome 11q13.1. Two slightly different splice variants found in different tissues were isolated and characterized. Sequence analysis of the region surrounding the translation start revealed a CpG island, a classical feature of widely expressed genes. Functional studies of the promoter region with a luciferase reporter system suggested a strong transcriptional activity residing within 600 bp of the 5' flanking region. PMID:11104667
[Polymorphic loci and polymorphism analysis of short tandem repeats within XNP gene].

PubMed

Liu, Qi-Ji; Gong, Yao-Qin; Guo, Chen-Hong; Chen, Bing-Xi; Li, Jiang-Xia; Guo, Yi-Shou

2002-01-01

To select polymorphic short tandem repeat markers within X-linked nuclear protein (XNP) gene, genomic clones which contain XNP gene were recognized by homologous analysis with XNP cDNA. By comparing the cDNA with genomic DNA, non-exonic sequences were identified, and short tandem repeats were selected from non-exonic sequences by using BCM search Launcher. Polymorphisms of the short tandem repeats in Chinese population were evaluated by PCR amplification and PAGE. Five short tandem repeats were identified from XNP gene, two of which were polymorphic. Four and 11 alleles were observed in Chinese population for XNPSTR1 and XNPSTR4, respectively. Heterozygosities were 47% for XNPSTR1 and 70% for XNPSTR4. XNPSTR1 and XNPSTR4 localized within 3' end and intron 10, respectively. Two polymorphic short tandem repeats have been identified within XNP gene and will be useful for linkage analysis and gene diagnosis of XNP gene.

Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing

PubMed Central

Manske, Magnus; Miotto, Olivo; Campino, Susana; Auburn, Sarah; Almagro-Garcia, Jacob; Maslen, Gareth; O’Brien, Jack; Djimde, Abdoulaye; Doumbo, Ogobara; Zongo, Issaka; Ouedraogo, Jean-Bosco; Michon, Pascal; Mueller, Ivo; Siba, Peter; Nzila, Alexis; Borrmann, Steffen; Kiara, Steven M.; Marsh, Kevin; Jiang, Hongying; Su, Xin-Zhuan; Amaratunga, Chanaki; Fairhurst, Rick; Socheat, Duong; Nosten, Francois; Imwong, Mallika; White, Nicholas J.; Sanders, Mandy; Anastasi, Elisa; Alcock, Dan; Drury, Eleanor; Oyola, Samuel; Quail, Michael A.; Turner, Daniel J.; Rubio, Valentin Ruano; Jyothi, Dushyanth; Amenga-Etego, Lucas; Hubbart, Christina; Jeffreys, Anna; Rowlands, Kate; Sutherland, Colin; Roper, Cally; Mangano, Valentina; Modiano, David; Tan, John C.; Ferdig, Michael T.; Amambua-Ngwa, Alfred; Conway, David J.; Takala-Harrison, Shannon; Plowe, Christopher V.; Rayner, Julian C.; Rockett, Kirk A.; Clark, Taane G.; Newbold, Chris I.; Berriman, Matthew; MacInnis, Bronwyn; Kwiatkowski, Dominic P.

2013-01-01

Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. 1,2 Here we describe methods for large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short term culture. Analysis of 86,158 exonic SNPs that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome. PMID:22722859
Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.

PubMed

Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric

2014-01-29

Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.
Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

PubMed

Nilsson, R Henrik; Kristiansson, Erik; Ryberg, Martin; Larsson, Karl-Henrik

2005-07-18

During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi--a field where species identification often is prohibitively complex--and the much used ITS locus were chosen as test bed. A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service http://emerencia.math.chalmers.se, users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases contain a thorough sampling of taxonomically well-annotated sequences. Taxonomy, held by some to be an old-fashioned trade, has accordingly never been more important. emerencia does not automate the taxonomic process, but it does allow researchers to focus their efforts elsewhere than countless manual BLAST runs and arduous sieving of BLAST hit lists. The emerencia system is available on an open source basis for local installation with any organism and gene group as targets.
Sequencing of the amylopullulanase (apu) gene of Thermoanaerobacter ethanolicus 39E, and identification of the active site by site-directed mutagenesis.

PubMed

Mathupala, S P; Lowe, S E; Podkovyrov, S M; Zeikus, J G

1993-08-05

The complete nucleotide sequence of the gene encoding the dual active amylopullulanase of Thermoanaerobacter ethanolicus 39E (formerly Clostridium thermohydrosulfuricum) was determined. The structural gene (apu) contained a single open reading frame 4443 base pairs in length, corresponding to 1481 amino acids, with an estimated molecular weight of 162,780. Analysis of the deduced sequence of apu with sequences of alpha-amylases and alpha-1,6 debranching enzymes enabled the identification of four conserved regions putatively involved in substrate binding and in catalysis. The conserved regions were localized within a 2.9-kilobase pair gene fragment, which encoded a M(r) 100,000 protein that maintained the dual activities and thermostability of the native enzyme. The catalytic residues of amylopullulanase were tentatively identified by using hydrophobic cluster analysis for comparison of amino acid sequences of amylopullulanase and other amylolytic enzymes. Asp597, Glu626, and Asp703 were individually modified to their respective amide form, or the alternate acid form, and in all cases both alpha-amylase and pullulanase activities were lost, suggesting the possible involvement of 3 residues in a catalytic triad, and the presence of a putative single catalytic site within the enzyme. These findings substantiate amylopullulanase as a new type of amylosaccharidase.
De novo sequencing and analysis of the transcriptome of Panax ginseng in the leaf-expansion period.

PubMed

Liu, Shichao; Wang, Siming; Liu, Meichen; Yang, Fei; Zhang, Hui; Liu, Shiyang; Wang, Qun; Zhao, Yu

2016-08-01

Panax ginseng, a traditional Chinese medicine, is used worldwide for its variety of health benefits and its treatment efficacy. However, it is difficult to cultivate due to its vulnerability to environmental stresses. The present study provided the first report, to the best of our knowledge, of transcriptome analysis of ginseng at the leaf‑expansion stage. Using the Illumina sequencing platform, >40,000,000 high‑quality paired‑end reads were obtained and assembled into 100,533 unique sequences. When the sequences were searched against the publicly available National Center for Biotechnology Information protein database using The Basic Local Alignment Search Tool, 61,599 sequences exhibited similarity to known proteins. Functional annotation and classification, including use of the Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases, revealed that the activated genes in ginseng were predominantly ribonuclease‑like storage genes, environmental stress genes, pathogenesis-related genes and other antioxidant genes. A number of candidate genes in environmental stress‑associated pathways were also identified. These novel data provide useful information on the growth and development stages of ginseng, and serve as an important public information platform for further understanding of the molecular mechanisms and functional genomics of ginseng.
Reference-free comparative genomics of 174 chloroplasts.

PubMed

Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H

2012-01-01

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
Reference-Free Comparative Genomics of 174 Chloroplasts

PubMed Central

Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

2012-01-01

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. PMID:23185288
Implementation on a nonlinear concrete cracking algorithm in NASTRAN

NASA Technical Reports Server (NTRS)

Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.; Chang, H.

1976-01-01

A computer code for the analysis of reinforced concrete structures was developed using NASTRAN as a basis. Nonlinear iteration procedures were developed for obtaining solutions with a wide variety of loading sequences. A direct access file system was used to save results at each load step to restart within the solution module for further analysis. A multi-nested looping capability was implemented to control the iterations and change the loads. The basis for the analysis is a set of mutli-layer plate elements which allow local definition of materials and cracking properties.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

PubMed

Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

2015-03-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

PubMed Central

DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

2015-01-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
The Treacher Collins syndrome (TCOF1) gene product, treacle, is targeted to the nucleolus by signals in its C-terminus.

PubMed

Winokur, S T; Shiang, R

1998-11-01

The TCOF1 gene product, treacle, responsible for the craniofacial disorder Treacher Collins syndrome, has been predicted to be a member of a class of nucleolar phosphoproteins based on its primary amino acid sequence. Treacle is a low complexity protein with ten repeating units of acidic and basic residues, each of which contains a large number of putative casein kinase 2 and protein kinase C phosphorylation sites. In addition, the C-terminus of treacle contains multiple putative nuclear localization signals. The overall structure of treacle, as well as sequence similarity to several nucleolar phosphoproteins, predicts that treacle is a member of this class of proteins. Using green fluorescent protein fusion constructs with the full-length and deleted domains of the murine homolog of treacle, we demonstrate that the cellular localization of treacle is nucleolar. This localization is mediated by the last 41 residues of the C-terminus (residues 1262-1302). At least two functional nuclear localization signals have been identified in the protein, one between residues 1176 and 1270 and the second within the last 32 residues of the protein (1271-1302). The nucleolar localization signal is disrupted by two constructs that split the C-terminal region between residues 1270 and 1271. This study provides the first direct analysis of treacle and demonstrates that the protein involved in TCOF1 is a nucleolar protein.
Development and verification of global/local analysis techniques for laminated composites

NASA Technical Reports Server (NTRS)

Thompson, Danniella Muheim; Griffin, O. Hayden, Jr.

1991-01-01

A two-dimensional to three-dimensional global/local finite element approach was developed, verified, and applied to a laminated composite plate of finite width and length containing a central circular hole. The resulting stress fields for axial compression loads were examined for several symmetric stacking sequences and hole sizes. Verification was based on comparison of the displacements and the stress fields with those accepted trends from previous free edge investigations and a complete three-dimensional finite element solution of the plate. The laminates in the compression study included symmetric cross-ply, angle-ply and quasi-isotropic stacking sequences. The entire plate was selected as the global model and analyzed with two-dimensional finite elements. Displacements along a region identified as the global/local interface were applied in a kinematically consistent fashion to independent three-dimensional local models. Local areas of interest in the plate included a portion of the straight free edge near the hole, and the immediate area around the hole. Interlaminar stress results obtained from the global/local analyses compares well with previously reported trends, and some new conclusions about interlaminar stress fields in plates with different laminate orientations and hole sizes are presented for compressive loading. The effectiveness of the global/local procedure in reducing the computational effort required to solve these problems is clearly demonstrated through examination of the computer time required to formulate and solve the linear, static system of equations which result for the global and local analyses to those required for a complete three-dimensional formulation for a cross-ply laminate. Specific processors used during the analyses are described in general terms. The application of this global/local technique is not limited software system, and was developed and described in as general a manner as possible.
Partial Gene Sequencing of CYP1A, Vitellogenin, and Metallothionein in Mosquitofish Gambusia yucatana and Gambusia sexradiata.

PubMed

Vázquez-Euán, Roberto; Escalante-Herrera, Karla S; Rodríguez-Fuentes, Gabriela

2017-01-01

Ground characteristics in the Yucatan Peninsula make recovery and treatment of wastewater very expensive. This situation has contributed to an increase of pollutants in the aquifer. Unfortunately, studies related to the effects of those pollutants in native organisms are scarce. The aim of this work was to obtain partial sequences of widely known genes used as biomarkers of pollutant effect in Gambusia yucatana and Gambusia sexradiata. The studied genes were: cytochrome P450 1A (CYP1A); vitellogenin (VTG); metallothionein (MT), and two housekeeping genes, 18S and β-actin. From reported sequences of Gambusia affinis, primers were designed and amplification was done in the local Gambusia species exposed for 48 h to gasoline (100 µL/L, stirred for 24 h pre-exposure). Preliminary results revealed partial sequences of all genes with an approximate average length of 200 bp. BLAST analysis of found sequences indicated a minimum of 97% identity with reported sequences for G. affinis or Gambusia holbrooki showing great similarity.
Bi-PROF

PubMed Central

Gries, Jasmin; Schumacher, Dirk; Arand, Julia; Lutsik, Pavlo; Markelova, Maria Rivera; Fichtner, Iduna; Walter, Jörn; Sers, Christine; Tierling, Sascha

2013-01-01

The use of next generation sequencing has expanded our view on whole mammalian methylome patterns. In particular, it provides a genome-wide insight of local DNA methylation diversity at single nucleotide level and enables the examination of single chromosome sequence sections at a sufficient statistical power. We describe a bisulfite-based sequence profiling pipeline, Bi-PROF, which is based on the 454 GS-FLX Titanium technology that allows to obtain up to one million sequence stretches at single base pair resolution without laborious subcloning. To illustrate the performance of the experimental workflow connected to a bioinformatics program pipeline (BiQ Analyzer HT) we present a test analysis set of 68 different epigenetic marker regions (amplicons) in five individual patient-derived xenograft tissue samples of colorectal cancer and one healthy colon epithelium sample as a control. After the 454 GS-FLX Titanium run, sequence read processing and sample decoding, the obtained alignments are quality controlled and statistically evaluated. Comprehensive methylation pattern interpretation (profiling) assessed by analyzing 102-104 sequence reads per amplicon allows an unprecedented deep view on pattern formation and methylation marker heterogeneity in tissues concerned by complex diseases like cancer. PMID:23803588
Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission.

PubMed

Gouma, Sigrid; Cremer, Jeroen; Parkkali, Saara; Veldhuijzen, Irene; van Binnendijk, Rob S; Koopmans, Marion P G

2016-11-01

Various mumps outbreaks have occurred in the Netherlands since 2004, particularly among persons who had received 2 doses of measles, mumps, and rubella (MMR) vaccination. Genomic typing of pathogens can be used to track outbreaks, but the established genotyping of mumps virus based on the small hydrophobic (SH) gene sequences did not provide sufficient resolution. Therefore, we expanded the sequencing to include fusion (F) gene and haemagglutinin-neuraminidase (HN) gene sequences in addition to the SH gene sequences from 109 mumps virus genotype G strains obtained between 2004 and mid 2015 in the Netherlands. When the molecular information from these 3 genes was combined, we were able to identify separate mumps virus clusters and track mumps virus transmission. The analyses suggested that multiple mumps virus introductions occurred in the Netherlands between 2004 and 2015 resulting in several mumps outbreaks throughout this period, whereas during some local outbreaks the molecular data pointed towards endemic circulation. Combined analysis of epidemiological data and sequence data collected in 2015 showed good support for the phylogenetic clustering. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Coordinate action of distinct sequence elements localizes checkpoint kinase Hsl1 to the septin collar at the bud neck in Saccharomyces cerevisiae

PubMed Central

Finnigan, Gregory C.; Sterling, Sarah M.; Duvalyan, Angela; Liao, Elizabeth N.; Sargsyan, Aspram; Garcia, Galo; Nogales, Eva; Thorner, Jeremy

2016-01-01

Passage through the eukaryotic cell cycle requires processes that are tightly regulated both spatially and temporally. Surveillance mechanisms (checkpoints) exert quality control and impose order on the timing and organization of downstream events by impeding cell cycle progression until the necessary components are available and undamaged and have acted in the proper sequence. In budding yeast, a checkpoint exists that does not allow timely execution of the G2/M transition unless and until a collar of septin filaments has properly assembled at the bud neck, which is the site where subsequent cytokinesis will occur. An essential component of this checkpoint is the large (1518-residue) protein kinase Hsl1, which localizes to the bud neck only if the septin collar has been correctly formed. Hsl1 reportedly interacts with particular septins; however, the precise molecular determinants in Hsl1 responsible for its recruitment to this cellular location during G2 have not been elucidated. We performed a comprehensive mutational dissection and accompanying image analysis to identify the sequence elements within Hsl1 responsible for its localization to the septins at the bud neck. Unexpectedly, we found that this targeting is multipartite. A segment of the central region of Hsl1 (residues 611–950), composed of two tandem, semiredundant but distinct septin-associating elements, is necessary and sufficient for binding to septin filaments both in vitro and in vivo. However, in addition to 611–950, efficient localization of Hsl1 to the septin collar in the cell obligatorily requires generalized targeting to the cytosolic face of the plasma membrane, a function normally provided by the C-terminal phosphatidylserine-binding KA1 domain (residues 1379–1518) in Hsl1 but that can be replaced by other, heterologous phosphatidylserine-binding sequences. PMID:27193302
An algorithm for automated detection, localization and measurement of local calcium signals from camera-based imaging.

PubMed

Ellefsen, Kyle L; Settle, Brett; Parker, Ian; Smith, Ian F

2014-09-01

Local Ca(2+) transients such as puffs and sparks form the building blocks of cellular Ca(2+) signaling in numerous cell types. They have traditionally been studied by linescan confocal microscopy, but advances in TIRF microscopy together with improved electron-multiplied CCD (EMCCD) cameras now enable rapid (>500 frames s(-1)) imaging of subcellular Ca(2+) signals with high spatial resolution in two dimensions. This approach yields vastly more information (ca. 1 Gb min(-1)) than linescan imaging, rendering visual identification and analysis of local events imaged both laborious and subject to user bias. Here we describe a routine to rapidly automate identification and analysis of local Ca(2+) events. This features an intuitive graphical user-interfaces and runs under Matlab and the open-source Python software. The underlying algorithm features spatial and temporal noise filtering to reliably detect even small events in the presence of noisy and fluctuating baselines; localizes sites of Ca(2+) release with sub-pixel resolution; facilitates user review and editing of data; and outputs time-sequences of fluorescence ratio signals for identified event sites along with Excel-compatible tables listing amplitudes and kinetics of events. Copyright © 2014 Elsevier Ltd. All rights reserved.
Nested PCR detection and phylogenetic analysis of Babesia bovis and Babesia bigemina in cattle from Peri-urban localities in Gauteng Province, South Africa.

PubMed

Mtshali, Phillip Senzo; Tsotetsi, Ana Mbokeleng; Thekisoe, Matlhahane Molifi Oriel; Mtshali, Moses Sibusiso

2014-01-01

Babesia bovis and Babesia bigemina are tick-borne hemoparasites causing babesiosis in cattle worldwide. This study was aimed at providing information about the occurrence and geographical distribution of B. bovis and B. bigemina species in cattle from Gauteng province, South Africa. A total of 268 blood samples collected from apparently healthy animals in 14 different peri-urban localities were tested using previously established nested PCR assays for the detection of B. bovis and B. bigemina species-specific genes encoding rhoptry-associated protein 1 (RAP-1) and SpeI-AvaI restriction fragment, respectively. Nested PCR assays revealed that the overall prevalence was 35.5% (95% confidence interval [CI]=± 5.73) and 76.1% (95% CI=± 5.11) for B. bovis and B. bigemina, respectively. PCR results were corroborated by sequencing amplicons of randomly selected samples. The neighbor-joining trees were constructed to study the phylogenetic relationship between B. bovis and B. bigemina sequences of randomly selected isolates. Analysis of phylogram inferred with B. bovis RAP-1 sequences indicated a close relationship between our isolates and GenBank strains. On the other hand, a tree constructed with B. bigemina gp45 sequences revealed a high degree of polymorphism among the B. bigemina isolates investigated in this study. Taken together, the results presented in this work indicate the high incidence of Babesia parasites in cattle from previously uncharacterised peri-urban areas of the Gauteng province. These findings suggest that effective preventative and control measures are essential to curtail the spread of Babesia infections among cattle populations in Gauteng.
Phylo-mLogo: an interactive and hierarchical multiple-logo visualization tool for alignment of many sequences

PubMed Central

Shih, Arthur Chun-Chieh; Lee, DT; Peng, Chin-Lin; Wu, Yu-Wei

2007-01-01

Background When aligning several hundreds or thousands of sequences, such as epidemic virus sequences or homologous/orthologous sequences of some big gene families, to reconstruct the epidemiological history or their phylogenies, how to analyze and visualize the alignment results of many sequences has become a new challenge for computational biologists. Although there are several tools available for visualization of very long sequence alignments, few of them are applicable to the alignments of many sequences. Results A multiple-logo alignment visualization tool, called Phylo-mLogo, is presented in this paper. Phylo-mLogo calculates the variabilities and homogeneities of alignment sequences by base frequencies or entropies. Different from the traditional representations of sequence logos, Phylo-mLogo not only displays the global logo patterns of the whole alignment of multiple sequences, but also demonstrates their local homologous logos for each clade hierarchically. In addition, Phylo-mLogo also allows the user to focus only on the analysis of some important, structurally or functionally constrained sites in the alignment selected by the user or by built-in automatic calculation. Conclusion With Phylo-mLogo, the user can symbolically and hierarchically visualize hundreds of aligned sequences simultaneously and easily check the changes of their amino acid sites when analyzing many homologous/orthologous or influenza virus sequences. More information of Phylo-mLogo can be found at URL . PMID:17319966
Identification and expression analysis of BoMF25, a novel polygalacturonase gene involved in pollen development of Brassica oleracea.

PubMed

Lyu, Meiling; Liang, Ying; Yu, Youjian; Ma, Zhiming; Song, Limin; Yue, Xiaoyan; Cao, Jiashu

2015-06-01

BoMF25 acts on pollen wall. Polygalacturonase (PG) is a pectin-digesting enzyme involved in numerous plant developmental processes and is described to be of critical importance for pollen wall development. In the present study, a PG gene, BoMF25, was isolated from Brassica oleracea. BoMF25 is the homologous gene of At4g35670, a PG gene in Arabidopsis thaliana with a high expression level at the tricellular pollen stage. Collinear analysis revealed that the orthologous gene of BoMF25 in Brassica campestris (syn. B. rapa) genome was probably lost because of genome deletion and reshuffling. Sequence analysis indicated that BoMF25 contained four classical conserved domains (I, II, III, and IV) of PG protein. Homology and phylogenetic analyses showed that BoMF25 was clustered in Clade F. The putative promoter sequence, containing classical cis-acting elements and pollen-specific motifs, could drive green fluorescence protein expression in onion epidermal cells. Quantitative RT-PCR analysis suggested that BoMF25 was mainly expressed in the anther at the late stage of pollen development. In situ hybridization analysis also indicated that the strong and specific expression signal of BoMF25 existed in pollen grains at the mature pollen stage. Subcellular localization showed that the fluorescence signal was observed in the cell wall of onion epidermal cells, which suggested that BoMF25 may be a secreted protein localized in the pollen wall.

Factor IX[sub Madrid 2]: A deletion/insertion in Facotr IX gene which abolishes the sequence of the donor junction at the exon IV-intron d splice site

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solera, J.; Magallon, M.; Martin-Villar, J.

1992-02-01

DNA from a patient with severe hemophilia B was evaluated by RFLP analysis, producing results which suggested the existence of a partial deletion within the factor IX gene. The deletion was further localized and characterized by PCR amplification and sequencing. The altered allele has a 4,442-bp deletion which removes both the donor splice site located at the 5[prime] end of intron d and the two last coding nucleotides located at the 3[prime] end of exon IV in the normal factor IX gene; this fragment has been inserted in inverted orientation. Two homologous sequences have been discovered at the ends ofmore » the deleted DNA fragment.« less
Molecular epidemiology of Plum pox virus in Japan.

PubMed

Maejima, Kensaku; Himeno, Misako; Komatsu, Ken; Takinami, Yusuke; Hashimoto, Masayoshi; Takahashi, Shuichiro; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

2011-05-01

For a molecular epidemiological study based on complete genome sequences, 37 Plum pox virus (PPV) isolates were collected from the Kanto region in Japan. Pair-wise analyses revealed that all 37 Japanese isolates belong to the PPV-D strain, with low genetic diversity (less than 0.8%). In phylogenetic analysis of the PPV-D strain based on complete nucleotide sequences, the relationships of the PPV-D strain were reconstructed with high resolution: at the global level, the American, Canadian, and Japanese isolates formed their own distinct monophyletic clusters, suggesting that the routes of viral entry into these countries were independent; at the local level, the actual transmission histories of PPV were precisely reconstructed with high bootstrap support. This is the first description of the molecular epidemiology of PPV based on complete genome sequences.
Confirmation of the "protein-traffic-hypothesis" and the "protein-localization-hypothesis" using the diabetes-mellitus-type-1-knock-in and transgenic-murine-models and the trepitope sequences.

PubMed

Arneth, Borros

2012-10-01

As possible mechanisms to explain the emergence of autoimmune diseases, the current author has suggested in earlier papers two new pathways: the "protein localization hypothesis" and the "protein traffic hypothesis". The "protein localization hypothesis" states that an autoimmune disease develops if a protein accumulates in a previously unoccupied compartment, that did not previously contain that protein. Similarly, the "protein traffic hypothesis" states that a sudden error within the transport of a certain protein leads to the emergence of an autoimmune disease. The current article discusses the usefulness of the different commercially available transgenic murine models of diabetes mellitus type 1 to confirm the aforementioned hypotheses. This discussion shows that several transgenic murine models of diabetes mellitus type 1 are in-line and confirm the aforementioned hypotheses. Furthermore, these hypotheses are additionally inline with the occurrence of several newly discovered protein sequences, the so-called trepitope sequences. These sequences modulate the immune response to certain proteins. The current study analyzed to what extent the hypotheses are supported by the occurrence of these new sequences. Thereby the occurrence of the trepitope sequences provides additional evidence supporting the aforementioned hypotheses. Both the "protein localization hypothesis" and the "protein traffic hypothesis" have the potential to lead to new causal therapy concepts. The "protein localization hypothesis" and the "protein traffic hypothesis" provide conceptional explanations for the diabetes mouse models as well as for the newly discovered trepitope sequences. Copyright © 2012 Elsevier Ltd. All rights reserved.
Epidemiological study of phylogenetic transmission clusters in a local HIV-1 epidemic reveals distinct differences between subtype B and non-B infections.

PubMed

Chalmet, Kristen; Staelens, Delfien; Blot, Stijn; Dinakis, Sylvie; Pelgrom, Jolanda; Plum, Jean; Vogelaers, Dirk; Vandekerckhove, Linos; Verhofstede, Chris

2010-09-07

The number of HIV-1 infected individuals in the Western world continues to rise. More in-depth understanding of regional HIV-1 epidemics is necessary for the optimal design and adequate use of future prevention strategies. The use of a combination of phylogenetic analysis of HIV sequences, with data on patients' demographics, infection route, clinical information and laboratory results, will allow a better characterization of individuals responsible for local transmission. Baseline HIV-1 pol sequences, obtained through routine drug-resistance testing, from 506 patients, newly diagnosed between 2001 and 2009, were used to construct phylogenetic trees and identify transmission-clusters. Patients' demographics, laboratory and clinical data, were retrieved anonymously. Statistical analysis was performed to identify subtype-specific and transmission-cluster-specific characteristics. Multivariate analysis showed significant differences between the 59.7% of individuals with subtype B infection and the 40.3% non-B infected individuals, with regard to route of transmission, origin, infection with Chlamydia (p = 0.01) and infection with Hepatitis C virus (p = 0.017). More and larger transmission-clusters were identified among the subtype B infections (p < 0.001). Overall, in multivariate analysis, clustering was significantly associated with Caucasian origin, infection through homosexual contact and younger age (all p < 0.001). Bivariate analysis additionally showed a correlation between clustering and syphilis (p < 0.001), higher CD4 counts (p = 0.002), Chlamydia infection (p = 0.013) and primary HIV (p = 0.017). Combination of phylogenetics with demographic information, laboratory and clinical data, revealed that HIV-1 subtype B infected Caucasian men-who-have-sex-with-men with high prevalence of sexually transmitted diseases, account for the majority of local HIV-transmissions. This finding elucidates observed epidemiological trends through molecular analysis, and justifies sustained focus in prevention on this high risk group.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE PAGES

Daily, Jeffrey A.

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.

PubMed

Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario

2011-01-01

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
End Joining-Mediated Gene Expression in Mammalian Cells Using PCR-Amplified DNA Constructs that Contain Terminator in Front of Promoter.

PubMed

Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji

2015-12-01

Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

PubMed

Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

PubMed Central

Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi,, Naoki; Shigyo, Masayoshi

2012-01-01

Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum–shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F2 mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5. PMID:22690373
Molecular characterization of baculovirus Bombyx mori nucleopolyhedrovirus polyhedron mutants.

PubMed

Katsuma, S; Noguchi, Y; Shimada, T; Nagata, M; Kobayashi, M; Maeda, S

1999-01-01

Four newly isolated and two previously isolated polyhedron mutants of Bombyx mori nucleopolyhedrovirus (BmNPV) were studied. Two polyhedron deficient mutants, #126 and #136, produced small uncrystallized particles of polyhedrin in the nuclei and cytoplasm of infected cells. Mutant #211 produced a large number of variably sized polyhedra in the nucleus and #220 produced a few large cuboidal polyhedra in the nucleus. Mutant #24 and #128 were previously isolated BmNPV mutants. Mutant #24 could not produce polyhedrin mRNA and polyhedra produced by mutant #128 lacked oral infectivity. Nucleotide sequence analysis indicated that five mutants (#126, #136, #211, #220 and #128) had amino acid substitutions in polyhedrin and mutant #24 had a point mutation only in the promoter region of the polyhedrin gene. Cotransfection experiments showed that the altered phenotypes were due to the mutations found in the polyhedrin gene regions. In mutants #126 and #136, amino acid sequences of the nuclear localization signal of polyhedrin were identical to those of wild-type BmNPV, suggesting that this sequence was necessary but not sufficient for nuclear localization of polyhedrin. Electron microscopic observation revealed that fewer occluded virions were contained in polyhedra of #128 and #220.
DNA barcoding for effective biodiversity assessment of a hyperdiverse arthropod group: the ants of Madagascar

PubMed Central

Smith, M. Alex; Fisher, Brian L; Hebert, Paul D.N

2005-01-01

The role of DNA barcoding as a tool to accelerate the inventory and analysis of diversity for hyperdiverse arthropods is tested using ants in Madagascar. We demonstrate how DNA barcoding helps address the failure of current inventory methods to rapidly respond to pressing biodiversity needs, specifically in the assessment of richness and turnover across landscapes with hyperdiverse taxa. In a comparison of inventories at four localities in northern Madagascar, patterns of richness were not significantly different when richness was determined using morphological taxonomy (morphospecies) or sequence divergence thresholds (Molecular Operational Taxonomic Unit(s); MOTU). However, sequence-based methods tended to yield greater richness and significantly lower indices of similarity than morphological taxonomy. MOTU determined using our molecular technique were a remarkably local phenomenon—indicative of highly restricted dispersal and/or long-term isolation. In cases where molecular and morphological methods differed in their assignment of individuals to categories, the morphological estimate was always more conservative than the molecular estimate. In those cases where morphospecies descriptions collapsed distinct molecular groups, sequence divergences of 16% (on average) were contained within the same morphospecies. Such high divergences highlight taxa for further detailed genetic, morphological, life history, and behavioral studies. PMID:16214741
Identification of hemoglobin variants by top-down mass spectrometry using selected diagnostic product ions.

PubMed

Coelho Graça, Didia; Hartmer, Ralf; Jabs, Wolfgang; Beris, Photis; Clerici, Lorella; Stoermer, Carsten; Samii, Kaveh; Hochstrasser, Denis; Tsybin, Yury O; Scherl, Alexander; Lescuyer, Pierre

2015-04-01

Hemoglobin disorder diagnosis is a complex procedure combining several analytical steps. Due to the lack of specificity of the currently used protein analysis methods, the identification of uncommon hemoglobin variants (proteoforms) can become a hard task to accomplish. The aim of this work was to develop a mass spectrometry-based approach to quickly identify mutated protein sequences within globin chain variants. To reach this goal, a top-down electron transfer dissociation mass spectrometry method was developed for hemoglobin β chain analysis. A diagnostic product ion list was established with a color code strategy allowing to quickly and specifically localize a mutation in the hemoglobin β chain sequence. The method was applied to the analysis of rare hemoglobin β chain variants and an (A)γ-β fusion protein. The results showed that the developed data analysis process allows fast and reliable interpretation of top-down electron transfer dissociation mass spectra by nonexpert users in the clinical area.
Global versus local linear beat-to-beat analysis of the relationship between arterial pressure and pulse transit time during dynamic exercise.

PubMed

Porta, A; Gasperi, C; Nollo, G; Lucini, D; Pizzinelli, P; Antolini, R; Pagani, M

2006-04-01

Global linear analysis has been traditionally performed to verify the relationship between pulse transit time (PTT) and systolic arterial pressure (SAP) at the level of their spontaneous beat-to-beat variabilities: PTT and SAP have been plotted in the plane (PTT,SAP) and a significant linear correlation has been found. However, this relationship is weak and in specific individuals cannot be found. This result prevents the utilization of the SAP-PTT relationship to derive arterial pressure changes from PTT measures on an individual basis. We propose a local linear approach to study the SAP-PTT relationship. This approach is based on the definition of short SAP-PTT sequences characterized by SAP increase (decrease) and PTT decrease (increase) and on their search in the SAP and PTT beat-to-beat series. This local approach was applied to PTT and SAP series derived from 13 healthy humans during incremental supine dynamic exercise (at 10, 20 and 30% of the nominal individual maximum effort) and compared to the global approach. While global approach failed in some subjects, local analysis allowed the extraction of the gain of the SAP-PTT relationship in all subjects both at rest and during exercise. When both local and global analyses were successful, the local SAP-PTT gain is more negative than the global one as a likely result of noise reduction.
Differential expression of the virulence-associated protein p57 and characterization of its duplicated gene rosa in virulent and attenuated strains of Renibacterium salmoninarum

USGS Publications Warehouse

O'Farrell, C. L.; Strom, M.S.

1999-01-01

Virulence mechanisms utilized by the salmonid fish pathogen Renibacterium salmoninarum are poorly understood. One potential virulence factor is p57 (also designated MSA for major soluble antigen), an abundant 57 kDa soluble protein that is predominately localized on the bacterial cell surface with significant levels released into the extracellular milieu. Previous studies of an attenuated strain, MT 239, indicated that it differs from virulent strains in the amount of surface-associated p57. In this report, we show overall expression of p57 in R. salmoninarum MT 239 is considerably reduced as compared to a virulent strain, ATCC 33209. The amount of cell-associated p57 is decreased while the level of p57 in the culture supernatant is nearly equivalent between the strains. To determine if lowered amount of cell-associated p57 was due to a sequence defect in p57, a genetic comparison was performed. Two copies of the gene encoding p57 (msa1 and msa2) were found in 33209 and MT 239, as well as in several other virulent isolates. Both copies from 33209 and MT 239 were cloned and sequenced and found to be identical to each other, and identical between the 2 strains. A comparison of msa1 and msa2 within each strain showed that their sequences diverge 40 base pairs 5, to the open reading frame, while sequences 3' to the open reading frame are essentially identical for at least 225 base pairs. Northern blot analysis showed no difference in steady state levels of rosa mRNA between the 2 strains. These data suggest that while cell-surface localization of p57 may be important for R. salmoninarum virulence, the differences in localization, and total p57 expression between 33209 anti MT 239 are not due to differences in rosa sequence or differences in steady state transcript levels.
Cingulin Contains Globular and Coiled-Coil Domains and Interacts with Zo-1, Zo-2, Zo-3, and Myosin

PubMed Central

Cordenonsi, Michelangelo; D'Atri, Fabio; Hammar, Eva; Parry, David A.D.; Kendrick-Jones, John; Shore, David; Citi, Sandra

1999-01-01

We characterized the sequence and protein interactions of cingulin, an M r 140–160-kD phosphoprotein localized on the cytoplasmic surface of epithelial tight junctions (TJ). The derived amino acid sequence of a full-length Xenopus laevis cingulin cDNA shows globular head (residues 1–439) and tail (1,326–1,368) domains and a central α-helical rod domain (440–1,325). Sequence analysis, electron microscopy, and pull-down assays indicate that the cingulin rod is responsible for the formation of coiled-coil parallel dimers, which can further aggregate through intermolecular interactions. Pull-down assays from epithelial, insect cell, and reticulocyte lysates show that an NH2-terminal fragment of cingulin (1–378) interacts in vitro with ZO-1 (K d ∼5 nM), ZO-2, ZO-3, myosin, and AF-6, but not with symplekin, and a COOH-terminal fragment (377–1,368) interacts with myosin and ZO-3. ZO-1 and ZO-2 immunoprecipitates contain cingulin, suggesting in vivo interactions. Full-length cingulin, but not NH2-terminal and COOH-terminal fragments, colocalizes with endogenous cingulin in transfected MDCK cells, indicating that sequences within both head and rod domains are required for TJ localization. We propose that cingulin is a functionally important component of TJ, linking the submembrane plaque domain of TJ to the actomyosin cytoskeleton. PMID:10613913
Novel non-parametric models to estimate evolutionary rates and divergence times from heterochronous sequence data.

PubMed

Fourment, Mathieu; Holmes, Edward C

2014-07-24

Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.
Identification and functional characterization of a novel bipartite nuclear localization sequence in ARID1A.

PubMed

Bateman, Nicholas W; Shoji, Yutaka; Conrads, Kelly A; Stroop, Kevin D; Hamilton, Chad A; Darcy, Kathleen M; Maxwell, George L; Risinger, John I; Conrads, Thomas P

2016-01-01

AT-rich interactive domain-containing protein 1A (ARID1A) is a recently identified nuclear tumor suppressor frequently altered in solid tumor malignancies. We have identified a bipartite-like nuclear localization sequence (NLS) that contributes to nuclear import of ARID1A not previously described. We functionally confirm activity using GFP constructs fused with wild-type or mutant NLS sequences. We further show that cyto-nuclear localized, bipartite NLS mutant ARID1A exhibits greater stability than nuclear-localized, wild-type ARID1A. Identification of this undescribed functional NLS within ARID1A contributes vital insights to rationalize the impact of ARID1A missense mutations observed in patient tumors. Copyright © 2015 Elsevier Inc. All rights reserved.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase.

PubMed

Zemla, Adam T; Lang, Dorothy M; Kostova, Tanya; Andino, Raul; Ecale Zhou, Carol L

2011-06-02

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV (structure-alignment sequence variability), a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus, and we demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique, or that share structural similarity with proteins that would be considered distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local structural alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position. StralSV is provided as a web service at http://proteinmodel.org/AS2TS/STRALSV/.
Characterization of the telomere complex, TERF1 and TERF2 genes in muntjac species with fusion karyotypes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hartmann, Nils; Scherthan, Harry

The telomere binding proteins TRF1 and TRF2 maintain and protect chromosome ends and confer karyotypic stability. Chromosome evolution in the genus Muntiacus is characterized by numerous tandem (end-to-end) fusions. To study TRF1 and TRF2 telomere binding proteins in Muntiacus species, we isolated and characterized the TERF1 and -2 genes from Indian muntjac (Muntiacus muntjak vaginalis; 2n = 6 female) and from Chinese muntjac (Muntiacus reveesi; 2n = 46). Expression analysis revealed that both genes are ubiquitously expressed and sequence analysis identified several transcript variants of both TERF genes. Control experiments disclosed a novel testis-specific splice variant of TERF1 in humanmore » testes. Amino acid sequence comparisons demonstrate that Muntiacus TRF1 and in particular TRF2 are highly conserved between muntjac and human. In vivo TRF2-GFP and immuno-staining studies in muntjac cell lines revealed telomeric TRF2 localization, while deletion of the DNA binding domain abrogated this localization, suggesting muntjac TRF2 represents a functional telomere protein. Finally, expression analysis of a set of telomere-related genes revealed their presence in muntjac fibroblasts and testis tissue, which suggests the presence of a conserved telomere complex in muntjacs. However, a deviation from the common theme was noted for the TERT gene, encoding the catalytic subunit of telomerase; TERT expression could not be detected in Indian or Chinese muntjac cDNA or genomic DNA using a series of conserved primers, while TRAP assay revealed functional telomerase in Chinese muntjac testis tissues. This suggests muntjacs may harbor a diverged telomerase sequence.« less

TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

PubMed

Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

2018-01-04

Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
RNAbrowse: RNA-Seq De Novo Assembly Results Browser

PubMed Central

Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe

2014-01-01

Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse. PMID:24823498
High-resolution mapping of the 11q13 amplicon and identification of a gene, TAOS1, that is amplified and overexpressed in oral cancer cells

PubMed Central

Huang, Xin; Gollin, Susanne M.; Raja, Siva; Godfrey, Tony E.

2002-01-01

Amplification of chromosomal band 11q13 is a common event in human cancer. It has been reported in about 45% of head and neck carcinomas and in other cancers including esophageal, breast, liver, lung, and bladder cancer. To understand the mechanism of 11q13 amplification and to identify the potential oncogene(s) driving it, we have fine-mapped the structure of the amplicon in oral squamous cell carcinoma cell lines and localized the proximal and distal breakpoints. A 5-Mb physical map of the region has been prepared from which sequence is available. We quantified copy number of sequence-tagged site markers at 42–550 kb intervals along the length of the amplicon and defined the amplicon core and breakpoints by using TaqMan-based quantitative microsatellite analysis. The core of the amplicon maps to a 1.5-Mb region. The proximal breakpoint localizes to two intervals between sequence-tagged site markers, 550 kb and 160 kb in size, and the distal breakpoint maps to a 250 kb interval. The cyclin D1 gene maps to the amplicon core, as do two new expressed sequence tag clusters. We have analyzed one of these expressed sequence tag clusters and now report that it contains a previously uncharacterized gene, TAOS1 (tumor amplified and overexpressed sequence 1), which is both amplified and overexpressed in oral cancer cells. The data suggest that TAOS1 may be an amplification-dependent candidate oncogene with a role in the development and/or progression of human tumors, including oral squamous cell carcinomas. The approach described here should be useful for characterizing amplified genomic regions in a wide variety of tumors. PMID:12172009
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field

PubMed Central

Buck, Patrick M.; Bystroff, Christopher

2015-01-01

Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
HUNT: launch of a full-length cDNA database from the Helix Research Institute.

PubMed

Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y

2001-01-01

The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.
Wavelet analysis of frequency chaos game signal: a time-frequency signature of the C. elegans DNA.

PubMed

Messaoudi, Imen; Oueslati, Afef Elloumi; Lachiri, Zied

2014-12-01

Challenging tasks are encountered in the field of bioinformatics. The choice of the genomic sequence's mapping technique is one the most fastidious tasks. It shows that a judicious choice would serve in examining periodic patterns distribution that concord with the underlying structure of genomes. Despite that, searching for a coding technique that can highlight all the information contained in the DNA has not yet attracted the attention it deserves. In this paper, we propose a new mapping technique based on the chaos game theory that we call the frequency chaos game signal (FCGS). The particularity of the FCGS coding resides in exploiting the statistical properties of the genomic sequence itself. This may reflect important structural and organizational features of DNA. To prove the usefulness of the FCGS approach in the detection of different local periodic patterns, we use the wavelet analysis because it provides access to information that can be obscured by other time-frequency methods such as the Fourier analysis. Thus, we apply the continuous wavelet transform (CWT) with the complex Morlet wavelet as a mother wavelet function. Scalograms that relate to the organism Caenorhabditis elegans (C. elegans) exhibit a multitude of periodic organization of specific DNA sequences.
Dual-echo ASL based assessment of motor networks: a feasibility study

NASA Astrophysics Data System (ADS)

Storti, Silvia Francesca; Boscolo Galazzo, Ilaria; Pizzini, Francesca B.; Menegaz, Gloria

2018-04-01

Objective. Dual-echo arterial spin labeling (DE-ASL) technique has been recently proposed for the simultaneous acquisition of ASL and blood-oxygenation-level-dependent (BOLD)-functional magnetic resonance imaging (fMRI) data. The assessment of this technique in detecting functional connectivity at rest or during motor and motor imagery tasks is still unexplored both per-se and in comparison with conventional methods. The purpose is to quantify the sensitivity of the DE-ASL sequence with respect to the conventional fMRI sequence (cvBOLD) in detecting brain activations, and to assess and compare the relevance of node features in decoding the network structure. Approach. Thirteen volunteers were scanned acquiring a pseudo-continuous DE-ASL sequence from which the concomitant BOLD (ccBOLD) simultaneously to the ASL can be extracted. The approach consists of two steps: (i) model-based analyses for assessing brain activations at individual and group levels, followed by statistical analysis for comparing the activation elicited by the three sequences under two conditions (motor and motor imagery), respectively; (ii) brain connectivity graph-theoretical analysis for assessing and comparing the network models properties. Main results. Our results suggest that cvBOLD and ccBOLD have comparable sensitivity in detecting the regions involved in the active task, whereas ASL offers a higher degree of co-localization with smaller activation volumes. The connectivity results and the comparative analysis of node features across sequences revealed that there are no strong changes between rest and tasks and that the differences between the sequences are limited to few connections. Significance. Considering the comparable sensitivity of the ccBOLD and cvBOLD sequences in detecting activated brain regions, the results demonstrate that DE-ASL can be successfully applied in functional studies allowing to obtain both ASL and BOLD information within a single sequence. Further, DE-ASL is a powerful technique for research and clinical applications allowing to perform quantitative comparisons as well as to characterize functional connectivity.
Evolution of a global regulator: Lrp in four orders of γ-Proteobacteria.

PubMed

Unoarumhi, Yvette; Blumenthal, Robert M; Matson, Jyl S

2016-05-20

Bacterial global regulators each regulate the expression of several hundred genes. In Escherichia coli, the top seven global regulators together control over half of all genes. Leucine-responsive regulatory protein (Lrp) is one of these top seven global regulators. Lrp orthologs are very widely distributed, among both Bacteria and Archaea. Surprisingly, even within the phylum γ-Proteobacteria (which includes E. coli), Lrp is a global regulator in some orders and a local regulator in others. This raises questions about the evolution of Lrp and, more broadly, of global regulators. We examined Lrp sequences from four bacterial orders of the γ-Proteobacteria using phylogenetic and Logo analyses. The orders studied were Enterobacteriales and Vibrionales, in which Lrp plays a global role in tested species; Pasteurellales, in which Lrp is a local regulator in the tested species; and Alteromonadales, an order closely related to the other three but in which Lrp has not yet been studied. For comparison, we analyzed the Lrp paralog AsnC, which in all tested cases is a local regulator. The Lrp and AsnC phylogenetic clusters each divided, as expected, into subclusters representing the Enterobacteriales, Vibrionales, and Pasteuralles. However the Alteromonadales did not yield coherent clusters for either Lrp or AsnC. Logo analysis revealed signatures associated with globally- vs. locally- acting Lrp orthologs, providing testable hypotheses for which portions of Lrp are responsible for a global vs. local role. These candidate regions include both ends of the Lrp polypeptide but not, interestingly, the highly-conserved helix-turn-helix motif responsible for DNA sequence specificity. Lrp and AsnC have conserved sequence signatures that allow their unambiguous annotation, at least in γ-Proteobacteria. Among Lrp orthologs, specific residues correlated with global vs. local regulatory roles, and can now be tested to determine which are functionally relevant and which simply reflect divergence. In the Alteromonadales, it appears that there are different subgroups of Lrp orthologs, one of which may act globally while the other may act locally. These results suggest experiments to improve our understanding of the evolution of bacterial global regulators.
Random digital encryption secure communication system

NASA Technical Reports Server (NTRS)

Doland, G. D. (Inventor)

1982-01-01

The design of a secure communication system is described. A product code, formed from two pseudorandom sequences of digital bits, is used to encipher or scramble data prior to transmission. The two pseudorandom sequences are periodically changed at intervals before they have had time to repeat. One of the two sequences is transmitted continuously with the scrambled data for synchronization. In the receiver portion of the system, the incoming signal is compared with one of two locally generated pseudorandom sequences until correspondence between the sequences is obtained. At this time, the two locally generated sequences are formed into a product code which deciphers the data from the incoming signal. Provision is made to ensure synchronization of the transmitting and receiving portions of the system.
On the convergence of local approximations to pseudodifferential operators with applications

NASA Technical Reports Server (NTRS)

Hagstrom, Thomas

1994-01-01

We consider the approximation of a class pseudodifferential operators by sequences of operators which can be expressed as compositions of differential operators and their inverses. We show that the error in such approximations can be bounded in terms of L(1) error in approximating a convolution kernel, and use this fact to develop convergence results. Our main result is a finite time convergence analysis of the Engquist-Majda Pade approximants to the square root of the d'Alembertian. We also show that no spatially local approximation to this operator can be convergent uniformly in time. We propose some temporally local but spatially nonlocal operators with better long time behavior. These are based on Laguerre and exponential series.
Analysis of Duck Hepatitis B Virus Reverse Transcription Indicates a Common Mechanism for the Two Template Switches during Plus-Strand DNA Synthesis

PubMed Central

Havert, Michael B.; Ji, Lin; Loeb, Daniel D.

2002-01-01

The synthesis of the hepadnavirus relaxed circular DNA genome requires two template switches, primer translocation and circularization, during plus-strand DNA synthesis. Repeated sequences serve as donor and acceptor templates for these template switches, with direct repeat 1 (DR1) and DR2 for primer translocation and 5′r and 3′r for circularization. These donor and acceptor sequences are at, or near, the ends of the minus-strand DNA. Analysis of plus-strand DNA synthesis of duck hepatitis B virus (DHBV) has indicated that there are at least three other cis-acting sequences that make contributions during the synthesis of relaxed circular DNA. These sequences, 5E, M, and 3E, are located near the 5′ end, the middle, and the 3′ end of minus-strand DNA, respectively. The mechanism by which these sequences contribute to the synthesis of plus-strand DNA was unclear. Our aim was to better understand the mechanism by which 5E and M act. We localized the DHBV 5E element to a short sequence of approximately 30 nucleotides that is 100 nucleotides 3′ of DR2 on minus-strand DNA. We found that the new 5E mutants were partially defective for primer translocation/utilization at DR2. They were also invariably defective for circularization. In addition, examination of several new DHBV M variants indicated that they too were defective for primer translocation/utilization and circularization. Thus, this analysis indicated that 5E and M play roles in both primer translocation/utilization and circularization. In conjunction with earlier findings that 3E functions in both template switches, our findings indicate that the processes of primer translocation and circularization share a common underlying mechanism. PMID:11861843
High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA.

PubMed

Chandrananda, Dineika; Thorne, Natalie P; Bahlo, Melanie

2015-06-17

High-throughput sequencing of cell-free DNA fragments found in human plasma has been used to non-invasively detect fetal aneuploidy, monitor organ transplants and investigate tumor DNA. However, many biological properties of this extracellular genetic material remain unknown. Research that further characterizes circulating DNA could substantially increase its diagnostic value by allowing the application of more sophisticated bioinformatics tools that lead to an improved signal to noise ratio in the sequencing data. In this study, we investigate various features of cell-free DNA in plasma using deep-sequencing data from two pregnant women (>70X, >50X) and compare them with matched cellular DNA. We utilize a descriptive approach to examine how the biological cleavage of cell-free DNA affects different sequence signatures such as fragment lengths, sequence motifs at fragment ends and the distribution of cleavage sites along the genome. We show that the size distributions of these cell-free DNA molecules are dependent on their autosomal and mitochondrial origin as well as the genomic location within chromosomes. DNA mapping to particular microsatellites and alpha repeat elements display unique size signatures. We show how cell-free fragments occur in clusters along the genome, localizing to nucleosomal arrays and are preferentially cleaved at linker regions by correlating the mapping locations of these fragments with ENCODE annotation of chromatin organization. Our work further demonstrates that cell-free autosomal DNA cleavage is sequence dependent. The region spanning up to 10 positions on either side of the DNA cleavage site show a consistent pattern of preference for specific nucleotides. This sequence motif is present in cleavage sites localized to nucleosomal cores and linker regions but is absent in nucleosome-free mitochondrial DNA. These background signals in cell-free DNA sequencing data stem from the non-random biological cleavage of these fragments. This sequence structure can be harnessed to improve bioinformatics algorithms, in particular for CNV and structural variant detection. Descriptive measures for cell-free DNA features developed here could also be used in biomarker analysis to monitor the changes that occur during different pathological conditions.
NG6: Integrated next generation sequencing storage and processing environment.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

2012-09-09

Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
Adaptive Local Realignment of Protein Sequences.

PubMed

DeBlasio, Dan; Kececioglu, John

2018-06-11

While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein's entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising, which finds global parameter settings for an aligner, to now adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment has been implemented within the Opal aligner using the Facet accuracy estimator.
SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

PubMed

Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

2015-01-01

In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.
A novel species-specific tandem repeat DNA family from Sinapis arvensis: detection of telomere-like sequences.

PubMed

Kapila, R; Das, S; Srivastava, P S; Lakshmikumaran, M

1996-08-01

DNA sequences representing a tandemly repeated DNA family of the Sinapis arvensis genome were cloned and characterized. The 700-bp tandem repeat family is represented by two clones, pSA35 and pSA52, which are 697 and 709 bp in length, respectively. Dot matrix analysis of the sequences indicates the presence of repeated elements within each monomeric unit. Sequence analysis of the repetitive region of clones pSA35 and pSA52 shows that there are several copies of a 7-bp repeat element organized in tandem. The consensus sequence of this repeat element is 5'-TTTAGGG-3'. These elements are highly mutated and the difference in length between the two clones is due to different copy numbers of these elements. The repetitive region of clone pSA35 has 26 copies of the element TTTAGGG, whereas clone pSA52 has 28 copies. The repetitive region in both clones is flanked on either side by inverted repeats that may be footprints of a transposition event. Sequence comparison indicates that the element TTTAGGG is identical to telomeric repeats present in Arabidopsis, maize, tomato, and other plants. However, Bal31 digestion kinetics indicates non-telomeric localization of the 700-bp tandem repeats. The clones represent a novel repeat family as (i) they contain telomere-like motifs as subrepeats within each unit; and (ii) they do not hybridize to related crucifers and are species-specific in nature.
Isolation, expression, and chromosomal localization of the human mitochondrial capsule selenoprotein gene (MCSP)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aho, Hanne; Schwemmer, M.; Tessmann, D.

1996-03-01

The mitochondrial capsule selenoprotein (MCS) (HGMW-approved symbol MCSP) is one of three proteins that are important for the maintenance and stabilization of the crescent structure of the sperm mitochondria. We describe here the isolation of a cDNA, the exon-intron organization, the expression, and the chromosomal localization of the human MCS gene. Nucleotide sequence analysis of the human and mouse MCS cDNAs reveals that the 5{prime}- and 3{prime}-untranslated sequences are more conserved (71%) than the coding sequences (59%). The open reading frame encodes a 116-amino-acid protein and lacks the UGA codons, which have been reported to encode the selenocysteines in themore » N-terminal of the deduced mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein (39%). The most striking homology lies in the dicysteine motifs. Northern and Southern zooblot analyses reveal that the MCS gene in human, baboon, and bovine is more conserved than its counterparts in mouse and rat. The single intron in the human MCS gene is approximately 6 kb and interrupts the 5{prime}-untranslated region at a position equivalent to that in the mouse and rat genes. Northern blot and in situ hybridization experiments demonstrate that the expression of the human MCS gene is restricted to haploid spermatids. The human gene was assigned to q21 of chromosome 1. 30 refs., 9 figs.« less
The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

PubMed Central

Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

2009-01-01

PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964
Entropic Profiler – detection of conservation in genomes using information theory

PubMed Central

Fernandes, Francisco; Freitas, Ana T; Almeida, Jonas S; Vinga, Susana

2009-01-01

Background In the last decades, with the successive availability of whole genome sequences, many research efforts have been made to mathematically model DNA. Entropic Profiles (EP) were proposed recently as a new measure of continuous entropy of genome sequences. EP represent local information plots related to DNA randomness and are based on information theory and statistical concepts. They express the weighed relative abundance of motifs for each position in genomes. Their study is very relevant because under or over-representation segments are often associated with significant biological meaning. Findings The Entropic Profiler application here presented is a new tool designed to detect and extract under and over-represented DNA segments in genomes by using EP. It allows its computation in a very efficient way by recurring to improved algorithms and data structures, which include modified suffix trees. Available through a web interface and as downloadable source code, it allows to study positions and to search for motifs inside the whole sequence or within a specified range. DNA sequences can be entered from different sources, including FASTA files, pre-loaded examples or resuming a previously saved work. Besides the EP value plots, p-values and z-scores for each motif are also computed, along with the Chaos Game Representation of the sequence. Conclusion EP are directly related with the statistical significance of motifs and can be considered as a new method to extract and classify significant regions in genomes and estimate local scales in DNA. The present implementation establishes an efficient and useful tool for whole genome analysis. PMID:19416538
Comparative analysis and visualization of multiple collinear genomes

PubMed Central

2012-01-01

Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

A novel mutation in PRPF31, causative of autosomal dominant retinitis pigmentosa, using the BGISEQ-500 sequencer.

PubMed

Zheng, Yu; Wang, Hai-Lin; Li, Jian-Kang; Xu, Li; Tellier, Laurent; Li, Xiao-Lin; Huang, Xiao-Yan; Li, Wei; Niu, Tong-Tong; Yang, Huan-Ming; Zhang, Jian-Guo; Liu, Dong-Ning

2018-01-01

To study the genes responsible for retinitis pigmentosa. A total of 15 Chinese families with retinitis pigmentosa, containing 94 sporadically afflicted cases, were recruited. The targeted sequences were captured using the Target_Eye_365_V3 chip and sequenced using the BGISEQ-500 sequencer, according to the manufacturer's instructions. Data were aligned to UCSC Genome Browser build hg19, using the Burroughs Wheeler Aligner MEM algorithm. Local realignment was performed with the Genome Analysis Toolkit (GATK v.3.3.0) IndelRealigner, and variants were called with the Genome Analysis Toolkit Haplotypecaller, without any use of imputation. Variants were filtered against a panel derived from 1000 Genomes Project, 1000G_ASN, ESP6500, ExAC and dbSNP138. In all members of Family ONE and Family TWO with available DNA samples, the genetic variant was validated using Sanger sequencing. A novel, pathogenic variant of retinitis pigmentosa, c.357_358delAA (p.Ser119SerfsX5) was identified in PRPF31 in 2 of 15 autosomal-dominant retinitis pigmentosa (ADRP) families, as well as in one, sporadic case. Sanger sequencing was performed upon probands, as well as upon other family members. This novel, pathogenic genotype co-segregated with retinitis pigmentosa phenotype in these two families. ADRP is a subtype of retinitis pigmentosa, defined by its genotype, which accounts for 20%-40% of the retinitis pigmentosa patients. Our study thus expands the spectrum of PRPF31 mutations known to occur in ADRP, and provides further demonstration of the applicability of the BGISEQ500 sequencer for genomics research.
Analysis of genetic diversity of Tunisian pistachio (Pistacia vera L.) using sequence-related amplified polymorphism (SRAP) markers.

PubMed

Guenni, K; Aouadi, M; Chatti, K; Salhi-Hannachi, A

2016-10-17

Sequence-related amplified polymorphism (SRAP) markers preferentially amplify open reading frames and were used to study the genetic diversity of Tunisian pistachio. In the present study, 43 Pistacia vera accessions were screened using seven SRAP primer pairs. A total of 78 markers was revealed (95.12%) with an average polymorphic information content of 0.850. The results suggest that there is strong genetic differentiation, which characterizes the local resources (G ST = 0.307). High gene flow (N m = 1.127) among groups was explained by the exchange of plant material among regions. Analysis of molecular variance revealed significant differences within groups and showed that 73.88% of the total genetic diversity occurred within groups, whereas the remaining 26.12% occurred among groups. Bayesian clustering and principal component analysis identified three pools, El Guettar, Pollenizers, and the rest of the pistachios belonging to the Gabès, Kasserine, and Sfax localities. Bayesian analysis revealed that El Guettar and male genotypes were assigned with more than 80% probability. The BayeScan method proposed that locus 59 (F13-R9) could be used in the development of sex-linked SCAR markers from SRAP since it is a commonly detected locus in comparisons involving the Pollenizers group. This is the first application of SRAP markers for the assessment of genetic diversity in Tunisian germplasm of P. vera. Such information will be useful to define conservation strategies and improvement programs for this species.
Structural analysis, plastid localization, and expression of the biotin carboxylase subunit of acetyl-coenzyme A carboxylase from tobacco.

PubMed

Shorrosh, B S; Roesler, K R; Shintani, D; van de Loo, F J; Ohlrogge, J B

1995-06-01

Acetyl-coenzyme A carboxylase (ACCase, EC 6.4.1.2) catalyzes the synthesis of malonyl-coenzyme A, which is utilized in the plastid for de novo fatty acid synthesis and outside the plastid for a variety of reactions, including the synthesis of very long chain fatty acids and flavonoids. Recent evidence for both multifunctional and multisubunit ACCase isozymes in dicot plants has been obtained. We describe here the isolation of a tobacco (Nicotiana tabacum L. cv bright yellow 2 [NT1]) cDNA clone (E3) that encodes a 58.4-kD protein that shares 80% sequence similarity and 65% identity with the Anabaena biotin carboxylase subunit of ACCase. Similar to other biotin carboxylase subunits of acetyl-CoA carboxylase, the E3-encoded protein contains a putative ATP-binding motif but lacks a biotin-binding site (methionine-lysine-methionine or methionine-lysine-leucine). The deduced protein sequence contains a putative transit peptide whose function was confirmed by its ability to direct in vitro chloroplast uptake. The subcellular localization of this biotin carboxylase has also been confirmed to be plastidial by western blot analysis of pea (Pisum sativum), alfalfa (Medicago sativa L.), and castor (Ricinus communis L.) plastid preparations. Northern blot analysis indicates that the plastid biotin carboxylase transcripts are expressed at severalfold higher levels in castor seeds than in leaves.
Structural analysis, plastid localization, and expression of the biotin carboxylase subunit of acetyl-coenzyme A carboxylase from tobacco.

PubMed Central

Shorrosh, B S; Roesler, K R; Shintani, D; van de Loo, F J; Ohlrogge, J B

1995-01-01

Acetyl-coenzyme A carboxylase (ACCase, EC 6.4.1.2) catalyzes the synthesis of malonyl-coenzyme A, which is utilized in the plastid for de novo fatty acid synthesis and outside the plastid for a variety of reactions, including the synthesis of very long chain fatty acids and flavonoids. Recent evidence for both multifunctional and multisubunit ACCase isozymes in dicot plants has been obtained. We describe here the isolation of a tobacco (Nicotiana tabacum L. cv bright yellow 2 [NT1]) cDNA clone (E3) that encodes a 58.4-kD protein that shares 80% sequence similarity and 65% identity with the Anabaena biotin carboxylase subunit of ACCase. Similar to other biotin carboxylase subunits of acetyl-CoA carboxylase, the E3-encoded protein contains a putative ATP-binding motif but lacks a biotin-binding site (methionine-lysine-methionine or methionine-lysine-leucine). The deduced protein sequence contains a putative transit peptide whose function was confirmed by its ability to direct in vitro chloroplast uptake. The subcellular localization of this biotin carboxylase has also been confirmed to be plastidial by western blot analysis of pea (Pisum sativum), alfalfa (Medicago sativa L.), and castor (Ricinus communis L.) plastid preparations. Northern blot analysis indicates that the plastid biotin carboxylase transcripts are expressed at severalfold higher levels in castor seeds than in leaves. PMID:7610168
Sensor Web Interoperability Testbed Results Incorporating Earth Observation Satellites

NASA Technical Reports Server (NTRS)

Frye, Stuart; Mandl, Daniel J.; Alameh, Nadine; Bambacus, Myra; Cappelaere, Pat; Falke, Stefan; Derezinski, Linda; Zhao, Piesheng

2007-01-01

This paper describes an Earth Observation Sensor Web scenario based on the Open Geospatial Consortium s Sensor Web Enablement and Web Services interoperability standards. The scenario demonstrates the application of standards in describing, discovering, accessing and tasking satellites and groundbased sensor installations in a sequence of analysis activities that deliver information required by decision makers in response to national, regional or local emergencies.
Comprehensive analysis of the dynamic structure of nuclear localization signals.

PubMed

Yamagishi, Ryosuke; Okuyama, Takahide; Oba, Shuntaro; Shimada, Jiro; Chaen, Shigeru; Kaneko, Hiroki

2015-12-01

Most transcription and epigenetic factors in eukaryotic cells have nuclear localization signals (NLSs) and are transported to the nucleus by nuclear transport proteins. Understanding the features of NLSs and the mechanisms of nuclear transport might help understand gene expression regulation, somatic cell reprogramming, thus leading to the treatment of diseases associated with abnormal gene expression. Although many studies analyzed the amino acid sequence of NLSs, few studies investigated their three-dimensional structure. Therefore, we conducted a statistical investigation of the dynamic structure of NLSs by extracting the conformation of these sequences from proteins examined by X-ray crystallography and using a quantity defined as conformational determination rate (a ratio between the number of amino acids determining the conformation and the number of all amino acids included in a certain region). We found that determining the conformation of NLSs is more difficult than determining the conformation of other regions and that NLSs may tend to form more heteropolymers than monomers. Therefore, these findings strongly suggest that NLSs are intrinsically disordered regions.
The unusually large Plasmodium telomerase reverse-transcriptase localizes in a discrete compartment associated with the nucleolus

PubMed Central

Figueiredo, Luisa M.; Rocha, Eduardo P. C.; Mancio-Silva, Liliana; Prevost, Christine; Hernandez-Verdun, Danièle; Scherf, Artur

2005-01-01

Telomerase replicates chromosome ends, a function necessary for maintaining genome integrity. We have identified the gene that encodes the catalytic reverse transcriptase (RT) component of this enzyme in the malaria parasite Plasmodium falciparum (PfTERT) as well as the orthologous genes from two rodent and one simian malaria species. PfTERT is predicted to encode a basic protein that contains the major sequence motifs previously identified in known telomerase RTs (TERTs). At ∼2500 amino acids, PfTERT is three times larger than other characterized TERTs. We observed remarkable sequence diversity between TERT proteins of different Plasmodial species, with conserved domains alternating with hypervariable regions. Immunofluorescence analysis revealed that PfTERT is expressed in asexual blood stage parasites that have begun DNA synthesis. Surprisingly, rather than at telomere clusters, PfTERT typically localizes into a discrete nuclear compartment. We further demonstrate that this compartment is associated with the nucleolus, hereby defined for the first time in P.falciparum. PMID:15722485
Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions

PubMed Central

Laine, Elodie; Carbone, Alessandra

2015-01-01

Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684
Role of local sequence in the folding of cellular retinoic abinding protein I: structural propensities of reverse turns.

PubMed

Rotondi, Kenneth S; Gierasch, Lila M

2003-07-08

The experiments described here explore the role of local sequence in the folding of cellular retinoic acid binding protein I (CRABP I). This is a 136-residue, 10-stranded, antiparallel beta-barrel protein with seven beta-hairpins and is a member of the intracellular lipid binding protein (iLBP) family. The relative roles of local and global sequence information in governing the folding of this class of proteins are not well-understood. In question is whether the beta-turns are locally defined by short-range interactions within their sequences, and are thus able to play an active role in reducing the conformational space available to the folding chain, or whether the turns are passive, relying upon global forces to form. Short (six- and seven-residue) peptides corresponding to the seven CRABP I turns were analyzed by circular dichroism and NMR for their tendencies to take up the conformations they adopt in the context of the native protein. The results indicate that two of the peptides, encompassing turns III and IV in CRABP I, have a strong intrinsic bias to form native turns. Intriguingly, these turns are on linked hairpins in CRABP I and represent the best-conserved turns in the iLBP family. These results suggest that local sequence may play an important role in narrowing the conformational ensemble of CRABP I during folding.
Localized one-dimensional single voxel magnetic resonance spectroscopy without J coupling modulations.

PubMed

Lin, Yanqin; Lin, Liangjie; Wei, Zhiliang; Zhong, Jianhui; Chen, Zhong

2016-12-01

To acquire single voxel localized one-dimensional 1 H magnetic resonance spectroscopy (MRS) without J coupling modulations, free from amplitude and phase distortions. A pulse sequence, named PRESSIR, is developed for volume localized MRS without J modulations at arbitrary echo time (TE). The J coupling evolution is suppressed by the J-refocused module that uses a 90° pulse at the midpoint of a double spin echo. The localization performance of the PRESSIR sequence was tested with a two-compartment phantom. The proposed sequence shows similar voxel localization accuracy as PRESS. Both PRESSIR and PRESS sequences were performed on MRS brain phantom and pig brain tissue. PRESS spectra suffer from amplitude and phase distortions due to J modulations, especially under moderate and long TEs, while PRESSIR spectra are almost free from distortions. The PRESSIR sequence proposed herein enables the acquisition of single voxel in-phase MRS within a single scan. It allows an enhanced signal intensity of J coupling metabolites and reducing undesired broad resonances with short T2s while suppressing J modulations. Moreover, it provides an approach for direct measurement of nonoverlapping J coupling peaks and of transverse relaxation times T2s. Magn Reson Med 76:1661-1667, 2016. © 2015 International Society for Magnetic Resonance in Medicine. © 2015 International Society for Magnetic Resonance in Medicine.
RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

PubMed

Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

2016-02-01

During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.
Rising prevalence of non-B HIV-1 subtypes in North Carolina and evidence for local onward transmission.

PubMed

Dennis, Ann M; Hué, Stephane; Learner, Emily; Sebastian, Joseph; Miller, William C; Eron, Joseph J

2017-01-01

HIV-1 diversity is increasing in North American and European cohorts which may have public health implications. However, little is known about non-B subtype diversity in the southern United States, despite the region being the epicenter of the nation's epidemic. We characterized HIV-1 diversity and transmission clusters to identify the extent to which non-B strains are transmitted locally. We conducted cross-sectional analyses of HIV-1 partial pol sequences collected from 1997 to 2014 from adults accessing routine clinical care in North Carolina (NC). Subtypes were evaluated using COMET and phylogenetic analysis. Putative transmission clusters were identified using maximum-likelihood trees. Clusters involving non-B strains were confirmed and their dates of origin were estimated using Bayesian phylogenetics. Data were combined with demographic information collected at the time of sample collection and country of origin for a subset of patients. Among 24,972 sequences from 15,246 persons, the non-B subtype prevalence increased from 0% to 3.46% over the study period. Of 325 persons with non-B subtypes, diversity was high with over 15 pure subtypes and recombinants; subtype C (28.9%) and CRF02_AG (24.0%) were most common. While identification of transmission clusters was lower for persons with non-B versus B subtypes, several local transmission clusters (≥3 persons) involving non-B subtypes were identified and all were presumably due to heterosexual transmission. Prevalence of non-B subtype diversity remains low in NC but a statistically significant rise was identified over time which likely reflects multiple importation. However, the combined phylogenetic clustering analysis reveals evidence for local onward transmission. Detection of these non-B clusters suggests heterosexual transmission and may guide diagnostic and prevention interventions.
Structural mechanics of DNA wrapping in the nucleosome.

PubMed

Battistini, Federica; Hunter, Christopher A; Gardiner, Eleanor J; Packer, Martin J

2010-02-19

Experimental X-ray crystal structures and a database of calculated structural parameters of DNA octamers were used in combination to analyse the mechanics of DNA bending in the nucleosome core complex. The 1kx5 X-ray crystal structure of the nucleosome core complex was used to determine the relationship between local structure at the base-step level and the global superhelical conformation observed for nucleosome-bound DNA. The superhelix is characterised by a large curvature (597 degrees) in one plane and very little curvature (10 degrees) in the orthogonal plane. Analysis of the curvature at the level of 10-step segments shows that there is a uniform curvature of 30 degrees per helical turn throughout most of the structure but that there are two sharper kinks of 50 degrees at +/-2 helical turns from the central dyad base pair. The curvature is due almost entirely to the base-step parameter roll. There are large periodic variations in roll, which are in phase with the helical twist and account for 500 degrees of the total curvature. Although variations in the other base-step parameters perturb the local path of the DNA, they make minimal contributions to the total curvature. This implies that DNA bending in the nucleosome is achieved using the roll-slide-twist degree of freedom previously identified as the major degree of freedom in naked DNA oligomers. The energetics of bending into a nucleosome-bound conformation were therefore analysed using a database of structural parameters that we have previously developed for naked DNA oligomers. The minimum energy roll, the roll flexibility force constant and the maximum and minimum accessible roll values were obtained for each base step in the relevant octanucleotide context to account for the effects of conformational coupling that vary with sequence context. The distribution of base-step roll values and corresponding strain energy required to bend DNA into the nucleosome-bound conformation defined by the 1kx5 structure were obtained by applying a constant bending moment. When a single bending moment was applied to the entire sequence, the local details of the calculated structure did not match the experiment. However, when local 10-step bending moments were applied separately, the calculated structure showed excellent agreement with experiment. This implies that the protein applies variable bending forces along the DNA to maintain the superhelical path required for nucleosome wrapping. In particular, the 50 degrees kinks are constraints imposed by the protein rather than a feature of the 1kx5 DNA sequence. The kinks coincide with a relatively flexible region of the sequence, and this is probably a prerequisite for high-affinity nucleosome binding, but the bending strain energy is significantly higher at these points than for the rest of the sequence. In the most rigid regions of the sequence, a higher strain energy is also required to achieve the standard 30 degrees curvature per helical turn. We conclude that matching of the DNA sequence to the local roll periodicity required to achieve bending, together with the increased flexibility required at the kinks, determines the sequence selectivity of DNA wrapping in the nucleosome. 2009 Elsevier Ltd. All rights reserved.
Protein functional features are reflected in the patterns of mRNA translation speed.

PubMed

López, Daniel; Pazos, Florencio

2015-07-09

The degeneracy of the genetic code makes it possible for the same amino acid string to be coded by different messenger RNA (mRNA) sequences. These "synonymous mRNAs" may differ largely in a number of aspects related to their overall translational efficiency, such as secondary structure content and availability of the encoded transfer RNAs (tRNAs). Consequently, they may render different yields of the translated polypeptides. These mRNA features related to translation efficiency are also playing a role locally, resulting in a non-uniform translation speed along the mRNA, which has been previously related to some protein structural features and also used to explain some dramatic effects of "silent" single-nucleotide-polymorphisms (SNPs). In this work we perform the first large scale analysis of the relationship between three experimental proxies of mRNA local translation efficiency and the local features of the corresponding encoded proteins. We found that a number of protein functional and structural features are reflected in the patterns of ribosome occupancy, secondary structure and tRNA availability along the mRNA. One or more of these proxies of translation speed have distinctive patterns around the mRNA regions coding for certain protein local features. In some cases the three patterns follow a similar trend. We also show specific examples where these patterns of translation speed point to the protein's important structural and functional features. This support the idea that the genome not only codes the protein functional features as sequences of amino acids, but also as subtle patterns of mRNA properties which, probably through local effects on the translation speed, have some consequence on the final polypeptide. These results open the possibility of predicting a protein's functional regions based on a single genomic sequence, and have implications for heterologous protein expression and fine-tuning protein function.
A genome-specific repetitive DNA sequence from Oryza eichingeri: characterization, localization, and introgression to O. sativa.

PubMed

Yan, H. H.; Liu, G. Q.; Cheng, Z. K.; Li, X. B.; Liu, G. Z.; Min, S. K.; Zhu, L.H.

2002-02-01

In the course of transferring the brown planthopper resistance from a diploid, CC-genome wild rice species, Oryza eichingeri (IRGC acc. 105159 and 105163), to the cultivated rice variety 02428, we have isolated many alien addition and introgression lines. The O. eichingeri chromatin in some of these lines has previously been identified using genomic in situ hybridization and molecular-marker analysis. Here we cloned a tandemly repetitive DNA sequence from O. eichingeri IRGC acc105163, and detected it in 25 introgression lines. This repetitive DNA sequence showed high specificity to the rice CC genome, but was absent from all the four tetraploid species with BBCC or CCDD genomes. The monomer in this repetitive DNA sequence is 325-366-bp long, with a copy number of about 5,000 per 1 C of the O. eichingerigenome, showing 88% homology to a repetitive DNA sequence isolated from Oryza officinalis(2n=2 x=24, CC). Fluorescent in situ hybridization revealed 11 signals distributed over eight O. eichingeri chromosomes, mostly in terminal or subterminal regions.
Using GenBank.

PubMed

Wheeler, David

2007-01-01

GenBank(R) is a comprehensive database of publicly available DNA sequences for more than 205,000 named organisms and for more than 60,000 within the embryophyta, obtained through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Daily data exchange with the European Molecular Biology Laboratory (EMBL) in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through the National Center for Biotechnology Information (NCBI) retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases with taxonomy, genome, mapping, protein structure, and domain information and the biomedical journal literature through PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available through FTP. GenBank usage scenarios ranging from local analyses of the data available through FTP to online analyses supported by the NCBI Web-based tools are discussed. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at http://www.ncbi.nlm.nih.gov.
A multiple-alignment based primer design algorithm for genetically highly variable DNA targets

PubMed Central

2013-01-01

Background Primer design for highly variable DNA sequences is difficult, and experimental success requires attention to many interacting constraints. The advent of next-generation sequencing methods allows the investigation of rare variants otherwise hidden deep in large populations, but requires attention to population diversity and primer localization in relatively conserved regions, in addition to recognized constraints typically considered in primer design. Results Design constraints include degenerate sites to maximize population coverage, matching of melting temperatures, optimizing de novo sequence length, finding optimal bio-barcodes to allow efficient downstream analyses, and minimizing risk of dimerization. To facilitate primer design addressing these and other constraints, we created a novel computer program (PrimerDesign) that automates this complex procedure. We show its powers and limitations and give examples of successful designs for the analysis of HIV-1 populations. Conclusions PrimerDesign is useful for researchers who want to design DNA primers and probes for analyzing highly variable DNA populations. It can be used to design primers for PCR, RT-PCR, Sanger sequencing, next-generation sequencing, and other experimental protocols targeting highly variable DNA samples. PMID:23965160
Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

PubMed

Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

2015-01-01

Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".
Role of the Box C/D Motif in Localization of Small Nucleolar RNAs to Coiled Bodies and Nucleoli

PubMed Central

Narayanan, Aarthi; Speckmann, Wayne; Terns, Rebecca; Terns, Michael P.

1999-01-01

Small nucleolar RNAs (snoRNAs) are a large family of eukaryotic RNAs that function within the nucleolus in the biogenesis of ribosomes. One major class of snoRNAs is the box C/D snoRNAs named for their conserved box C and box D sequence elements. We have investigated the involvement of cis-acting sequences and intranuclear structures in the localization of box C/D snoRNAs to the nucleolus by assaying the intranuclear distribution of fluorescently labeled U3, U8, and U14 snoRNAs injected into Xenopus oocyte nuclei. Analysis of an extensive panel of U3 RNA variants showed that the box C/D motif, comprised of box C′, box D, and the 3′ terminal stem of U3, is necessary and sufficient for the nucleolar localization of U3 snoRNA. Disruption of the elements of the box C/D motif of U8 and U14 snoRNAs also prevented nucleolar localization, indicating that all box C/D snoRNAs use a common nucleolar-targeting mechanism. Finally, we found that wild-type box C/D snoRNAs transiently associate with coiled bodies before they localize to nucleoli and that variant RNAs that lack an intact box C/D motif are detained within coiled bodies. These results suggest that coiled bodies play a role in the biogenesis and/or intranuclear transport of box C/D snoRNAs. PMID:10397754
Phylogeography and origin of Chinese domestic chicken.

PubMed

Wu, Y P; Huo, J H; Xie, J F; Liu, L X; Wei, Q P; Xie, M G; Kang, Z F; Ji, H Y; Ma, Y H

2014-04-01

The loss of local chicken breeds as result of replacement with cosmopolitan breeds indicates the need for conservation measures to protect the future of local genetic stocks. The aim of this study is to describe the patterns of polymorphism of the hypervariable control region of mitochondrial DNA (HVR1) in domestic chicken in China's Jiangxi province to investigate genetic diversity, genetic structure and phylo-dynamics. To this end, we sequenced the mtDNA HVR1 in 231 chickens including 22 individuals which belonged to previously published sequences. A neighbor-joining tree revealed that these samples clustered into five lineages (Lineages A, B, C, E and G). The highest haplotype diversity and nucleotide diversity were both found in Anyi tile-liked gray breed. We estimated that the most recent common ancestor of the local chicken existed approximately 16 million years ago. The mismatch distribution analysis showed two major peaks at positions 4 and 9, while the neutrality test (Tajima's D = -2.19, p < 0.05) and Fu's F-statistics (-8.59, p < 0.05) revealed a significant departure from the neutrality assumption. These results support the idea that domestication of chickens facilitated population increases. Results of a global AMOVA indicated that there was no obvious geographic structure among the local chicken breeds analyzed in this study. The data obtained in this study will assist future conservation management of local breeds and also reveals intriguing implications for the history of human population movements and commerce.

myPhyloDB: a local web server for the storage and analysis of metagenomic data.

PubMed

Manter, Daniel K; Korsa, Matthew; Tebbe, Caleb; Delgado, Jorge A

2016-01-01

myPhyloDB v.1.1.2 is a user-friendly personal database with a browser-interface designed to facilitate the storage, processing, analysis, and distribution of microbial community populations (e.g. 16S metagenomics data). MyPhyloDB archives raw sequencing files, and allows for easy selection of project(s)/sample(s) of any combination from all available data in the database. The data processing capabilities of myPhyloDB are also flexible enough to allow the upload and storage of pre-processed data, or use the built-in Mothur pipeline to automate the processing of raw sequencing data. myPhyloDB provides several analytical (e.g. analysis of covariance,t-tests, linear regression, differential abundance (DESeq2), and principal coordinates analysis (PCoA)) and normalization (rarefaction, DESeq2, and proportion) tools for the comparative analysis of taxonomic abundance, species richness and species diversity for projects of various types (e.g. human-associated, human gut microbiome, air, soil, and water) for any taxonomic level(s) desired. Finally, since myPhyloDB is a local web-server, users can quickly distribute data between colleagues and end-users by simply granting others access to their personal myPhyloDB database. myPhyloDB is available athttp://www.ars.usda.gov/services/software/download.htm?softwareid=472 and more information along with tutorials can be found on our websitehttp://www.myphylodb.org. Database URL:http://www.myphylodb.org. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the United States.
Sequence of Radiotherapy and Chemotherapy in Breast Cancer After Breast-Conserving Surgery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jobsen, Jan J., E-mail: J.Jobsen@mst.nl; Palen, Job van der; Department of Research Methodology, Measurement and Data Analysis, Faculty of Behavioural Science, University of Twente

2012-04-01

Purpose: The optimal sequence of radiotherapy and chemotherapy in breast-conserving therapy is unknown. Methods and Materials: From 1983 through 2007, a total of 641 patients with 653 instances of breast-conserving therapy (BCT), received both chemotherapy and radiotherapy and are the basis of this analysis. Patients were divided into three groups. Groups A and B comprised patients treated before 2005, Group A radiotherapy first and Group B chemotherapy first. Group C consisted of patients treated from 2005 onward, when we had a fixed sequence of radiotherapy first, followed by chemotherapy. Results: Local control did not show any differences among the threemore » groups. For distant metastasis, no difference was shown between Groups A and B. Group C, when compared with Group A, showed, on univariate and multivariate analyses, a significantly better distant metastasis-free survival. The same was noted for disease-free survival. With respect to disease-specific survival, no differences were shown on multivariate analysis among the three groups. Conclusion: Radiotherapy, as an integral part of the primary treatment of BCT, should be administered first, followed by adjuvant chemotherapy.« less
ACLAME: a CLAssification of Mobile genetic Elements, update 2010.

PubMed

Leplae, Raphaël; Lima-Mendez, Gipsi; Toussaint, Ariane

2010-01-01

The ACLAME database is dedicated to the collection, analysis and classification of sequenced mobile genetic elements (MGEs, in particular phages and plasmids). In addition to providing information on the MGEs content, classifications are available at various levels of organization. At the gene/protein level, families group similar sequences that are expected to share the same function. Families of four or more proteins are manually assigned with a functional annotation using the GeneOntology and the locally developed ontology MeGO dedicated to MGEs. At the genome level, evolutionary cohesive modules group sets of protein families shared among MGEs. At the population level, networks display the reticulate evolutionary relationships among MGEs. To increase the coverage of the phage sequence space, ACLAME version 0.4 incorporates 760 high-quality predicted prophages selected from the Prophinder database. Most of the data can be downloaded from the freely accessible ACLAME web site (http://aclame.ulb.ac.be). The BLAST interface for querying the database has been extended and numerous tools for in-depth analysis of the results have been added.
The complete genome sequence and genetic analysis of ΦCA82 a novel uncultured microphage from the turkey gastrointestinal system

PubMed Central

2011-01-01

The genomic DNA sequence of a novel enteric uncultured microphage, ΦCA82 from a turkey gastrointestinal system was determined utilizing metagenomics techniques. The entire circular, single-stranded nucleotide sequence of the genome was 5,514 nucleotides. The ΦCA82 genome is quite different from other microviruses as indicated by comparisons of nucleotide similarity, predicted protein similarity, and functional classifications. Only three genes showed significant similarity to microviral proteins as determined by local alignments using BLAST analysis. ORF1 encoded a predicted phage F capsid protein that was phylogenetically most similar to the Microviridae ΦMH2K member's major coat protein. The ΦCA82 genome also encoded a predicted minor capsid protein (ORF2) and putative replication initiation protein (ORF3) most similar to the microviral bacteriophage SpV4. The distant evolutionary relationship of ΦCA82 suggests that the divergence of this novel turkey microvirus from other microviruses may reflect unique evolutionary pressures encountered within the turkey gastrointestinal system. PMID:21714899
Worldwide prevalence of lentivirus infection in wild feline species: epidemiologic and phylogenetic aspects.

PubMed

Olmsted, R A; Langley, R; Roelke, M E; Goeken, R M; Adger-Johnson, D; Goff, J P; Albert, J P; Packer, C; Laurenson, M K; Caro, T M

1992-10-01

The natural occurrence of lentiviruses closely related to feline immunodeficiency virus (FIV) in nondomestic felid species is shown here to be worldwide. Cross-reactive antibodies to FIV were common in several free-ranging populations of large cats, including East African lions and cheetahs of the Serengeti ecosystem and in puma (also called cougar or mountain lion) populations throughout North America. Infectious puma lentivirus (PLV) was isolated from several Florida panthers, a severely endangered relict puma subspecies inhabiting the Big Cypress Swamp and Everglades ecosystems in southern Florida. Phylogenetic analysis of PLV genomic sequences from disparate geographic isolates revealed appreciable divergence from domestic cat FIV sequences as well as between PLV sequences found in different North American locales. The level of sequence divergence between PLV and FIV was greater than the level of divergence between human and certain simian immunodeficiency viruses, suggesting that the transmission of FIV between feline species is infrequent and parallels in time the emergence of HIV from simian ancestors.
Variation in genotype and higher virulence of a strain of Sporothrix schenckii causing disseminated cutaneous sporotrichosis.

PubMed

Zhang, Zhenying; Liu, Xiaoming; Lv, Xuelian; Lin, Jingrong

2011-12-01

Sporotrichosis is usually a localized, lymphocutaneous disease, but its disseminated type was rarely reported. The main objective of this study was to identify specific DNA sequence variation and virulence of a strain of Sporothrix schenckii isolated from the lesion of disseminated cutaneous sporotrichosis. We confirmed this strain to be S. schenckii by(®) tubulin and chitin synthase gene sequence analysis in addition to the routine mycological and partial ITS and NTS sequencing. We found a 10-bp deletion in the ribosomal NTS region of this strain, in reference to the sequence of control strains isolated from fixed cutaneous sporotrichosis. After inoculated into immunosuppressed mice, this strain caused more extensive system involvement and showed stronger virulence than the control strain isolated from a fixed cutaneous sporotrichosis. Our study thus suggests that different clinical manifestation of sporotrichosis may be associated with variation in genotype and virulence of the strain, independent of effects due to the immune status of the host.
Cloning and Characterization of an Outer Membrane Protein of Vibrio vulnificus Required for Heme Utilization: Regulation of Expression and Determination of the Gene Sequence

PubMed Central

Litwin, Christine M.; Byrne, Burke L.

1998-01-01

Vibrio vulnificus is a halophilic, marine pathogen that has been associated with septicemia and serious wound infections in patients with iron overload and preexisting liver disease. For V. vulnificus, the ability to acquire iron from the host has been shown to correlate with virulence. V. vulnificus is able to use host iron sources such as hemoglobin and heme. We previously constructed a fur mutant of V. vulnificus which constitutively expresses at least two iron-regulated outer membrane proteins, of 72 and 77 kDa. The N-terminal amino acid sequence of the 77-kDa protein purified from the V. vulnificus fur mutant had 67% homology with the first 15 amino acids of the mature protein of the Vibrio cholerae heme receptor, HutA. In this report, we describe the cloning, DNA sequence, mutagenesis, and analysis of transcriptional regulation of the structural gene for HupA, the heme receptor of V. vulnificus. DNA sequencing of hupA demonstrated a single open reading frame of 712 amino acids that was 50% identical and 66% similar to the sequence of V. cholerae HutA and similar to those of other TonB-dependent outer membrane receptors. Primer extension analysis localized one promoter for the V. vulnificus hupA gene. Analysis of the promoter region of V. vulnificus hupA showed a sequence homologous to the consensus Fur box. Northern blot analysis showed that the transcript was strongly regulated by iron. An internal deletion in the V. vulnificus hupA gene, done by using marker exchange, resulted in the loss of expression of the 77-kDa protein and the loss of the ability to use hemin or hemoglobin as a source of iron. The hupA deletion mutant of V. vulnificus will be helpful in future studies of the role of heme iron in V. vulnificus pathogenesis. PMID:9632577
Nuclear targeting of the maize R protein requires two nuclear localization sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shieh, M.W.; Raikhel, N.V.; Wessler, S.R.

1993-02-01

Previous genetic and structural evidence indicates that the maize R gene encodes a nuclear transcriptional activating factor. In-frame carboxyl- and amino-terminal fusions of the R gene to the reporter gene encoding [beta]-glucuronidase (GUS) were sufficient to direct GUS to the nucleus of the transiently transformed onion (Allium cepa) epidermal cells. Further analysis of chimeric constructs containing regions of the R gene fused to the GUS cDNA revealed three specific nuclear localization sequences (NLSs) that were capable of redirecting the GUS protein to the nucleus. Amino-terminal NLS-A (amino acids 100-109, GDRRAAPARP) contained several arginine residues; a similar localization signal is foundmore » in only a few viral proteins. The medial NLS-M (amino acids 419-428, MSERKRREKL) is a simian virus 40 large T antigen-type NLS, and the carboxyl-terminal NLS-C (amino acids 598-610, MISESLRKAIGKR) is a mating type [alpha]2 type. NLSs M and C are independently sufficient to direct the GUS protein to the nucleus when it is fused at the amino terminus of GUS, whereas NLS-A fused to GUS partitioned between the nucleus and cytoplasm. Similar partitioning was observed when localization signals NLS-A and NLS-C were independently fused to the carboxy-terminal portion of GUS. A sequential deletion of the localization signals indicated that the amino-terminal and carboxyl-terminal fusions of R and GUS were redirected to the nucleus only when both NLS-A and -M, or NLS-C and -M, were present. These results indicate that multiple localization signals are necessary for nuclear targeting of this protein. The conservation of the localization signals within the alleles of R and similar proteins from other organisms is also discussed. 45 refs., 6 figs.« less
Assembly-history dynamics of a pitcher-plant protozoan community in experimental microcosms.

PubMed

Kadowaki, Kohmei; Inouye, Brian D; Miller, Thomas E

2012-01-01

History drives community assembly through differences both in density (density effects) and in the sequence in which species arrive (sequence effects). Density effects arise from predictable population dynamics, which are free of history, but sequence effects are due to a density-free mechanism, arising solely from the order and timing of immigration events. Few studies have determined how components of immigration history (timing, number of individuals, frequency) alter local dynamics to determine community assembly, beyond addressing when immigration history produces historically contingent assembly. We varied density and sequence effects independently in a two-way factorial design to follow community assembly in a three-species aquatic protozoan community. A superior competitor, Colpoda steinii, mediated alternative community states; early arrival or high introduction density allowed this species to outcompete or suppress the other competitors (Poterioochromonas malhamensis and Eimeriidae gen. sp.). Multivariate analysis showed that density effects caused greater variation in community states, whereas sequence effects altered the mean community composition. A significant interaction between density and sequence effects suggests that we should refine our understanding of priority effects. These results highlight a practical need to understand not only the "ingredients" (species) in ecological communities but their "recipes" as well.
High-resolution community profiling of arbuscular mycorrhizal fungi.

PubMed

Schlaeppi, Klaus; Bender, S Franz; Mascher, Fabio; Russo, Giancarlo; Patrignani, Andrea; Camenzind, Tessa; Hempel, Stefan; Rillig, Matthias C; van der Heijden, Marcel G A

2016-11-01

Community analyses of arbuscular mycorrhizal fungi (AMF) using ribosomal small subunit (SSU) or internal transcribed spacer (ITS) DNA sequences often suffer from low resolution or coverage. We developed a novel sequencing based approach for a highly resolving and specific profiling of AMF communities. We took advantage of previously established AMF-specific PCR primers that amplify a c. 1.5-kb long fragment covering parts of SSU, ITS and parts of the large ribosomal subunit (LSU), and we sequenced the resulting amplicons with single molecule real-time (SMRT) sequencing. The method was applicable to soil and root samples, detected all major AMF families and successfully discriminated closely related AMF species, which would not be discernible using SSU sequences. In inoculation tests we could trace the introduced AMF inoculum at the molecular level. One of the introduced strains almost replaced the local strain(s), revealing that AMF inoculation can have a profound impact on the native community. The methodology presented offers researchers a powerful new tool for AMF community analysis because it unifies improved specificity and enhanced resolution, whereas the drawback of medium sequencing throughput appears of lesser importance for low-diversity groups such as AMF. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
The primary structure of the thymidine kinase gene of fish lymphocystis disease virus.

PubMed

Schnitzler, P; Handermann, M; Szépe, O; Darai, G

1991-06-01

The DNA nucleotide sequence of the thymidine kinase (TK) gene of fish lymphocystis disease virus (FLDV) which has been localized between the coordinates 0.678 to 0.688 of the viral genome was determined. The analysis of the DNA nucleotide sequence located between the recognition sites of HindIII (0.669 map unit; nucleotide position 1) and AccI (nucleotide position 2032) revealed the presence of an open reading frame of 954 bp on the lower strand of this region between nucleotide positions 1868 (ATG) and 915 (TAA). It encodes for a protein of 318 amino acid residues. The evolutionary relationships of the TK gene of FLDV to the other known TK genes was investigated using the method of progressive sequence alignment. These analyses revealed a high degree of diversity between the protein sequence of FLDV TK gene and the amino acid composition of other TKs tested. However, significant conservations were detected at several regions of amino acid residues of the FLDV TK protein when compared to the amino acid sequence of TKs of African swine fever virus, fowlpox virus, shope fibroma virus, and vaccinia virus and to the amino acid sequences of the cellular cytoplasmic TK of chicken, mouse, and man.
Efficient and dynamic nuclear localization of green fluorescent protein via RNA binding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kitamura, Akira; Nakayama, Yusaku; Kinjo, Masataka, E-mail: kinjo@sci.hokudai.ac.jp

2015-07-31

Classical nuclear localization signal (NLS) sequences have been used for artificial localization of green fluorescent protein (GFP) in the nucleus as a positioning marker or for measurement of the nuclear-cytoplasmic shuttling rate in living cells. However, the detailed mechanism of nuclear retention of GFP-NLS remains unclear. Here, we show that a candidate mechanism for the strong nuclear retention of GFP-NLS is via the RNA-binding ability of the NLS sequence. GFP tagged with a classical NLS derived from Simian virus 40 (GFP-NLS{sup SV40}) localized not only in the nucleoplasm, but also to the nucleolus, the nuclear subdomain in which ribosome biogenesismore » takes place. GFP-NLS{sup SV40} in the nucleolus was mobile, and intriguingly, the diffusion coefficient, which indicates the speed of diffusing molecules, was 1.5-fold slower than in the nucleoplasm. Fluorescence correlation spectroscopy (FCS) analysis showed that GFP-NLS{sup SV40} formed oligomers via RNA binding, the estimated molecular weight of which was larger than the limit for passive nuclear export into the cytoplasm. These findings suggest that the nuclear localization of GFP-NLS{sup SV40} likely results from oligomerization mediated via RNA binding. The analytical technique used here can be applied for elucidating the details of other nuclear localization mechanisms, including those of several types of nuclear proteins. In addition, GFP-NLS{sup SV40} can be used as an excellent marker for studying both the nucleoplasm and nucleolus in living cells. - Highlights: • Nuclear localization signal-tagged GFP (GFP-NLS) showed clear nuclear localization. • The GFP-NLS dynamically localized not only in the nucleoplasm, but also to the nucleolus. • The nuclear localization of GFP-NLS results from transient oligomerization mediated via RNA binding. • Our NLS-tagging procedure is ideal for use in artificial sequestration of proteins in the nucleus.« less
Identification of a nuclear localization sequence in the polyomavirus capsid protein VP2

NASA Technical Reports Server (NTRS)

Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

1992-01-01

A nuclear localization signal (NLS) has been identified in the C-terminal (Glu307-Glu-Asp-Gly-Pro-Gln-Lys-Lys-Lys-Arg-Arg-Leu318) amino acid sequence of the polyomavirus minor capsid protein VP2. The importance of this amino acid sequence for nuclear transport of newly synthesized VP2 was demonstrated by a genetic "subtractive" study using the constructs pSG5VP2 (expressing full-length VP2) and pSG5 delta 3VP2 (expressing truncated VP2, lacking amino acids Glu307-Leu318). These constructs were transfected into COS-7 cells, and the intracellular localization of the VP2 protein was determined by indirect immunofluorescence. These studies revealed that the full-length VP2 was localized in the nucleus, while the truncated VP2 protein was localized in the cytoplasm and not transported to the nucleus. A biochemical "additive" approach was also used to determine whether this sequence could target nonnuclear proteins to the nucleus. A synthetic peptide identical to VP2 amino acids Glu307-Leu318 was cross-linked to the nonnuclear proteins bovine serum albumin (BSA) or immunoglobulin G (IgG). The conjugates were then labeled with fluorescein isothiocyanate and microinjected into the cytoplasm of NIH 3T6 cells. Both conjugates localized in the nucleus of the microinjected cells, whereas unconjugated BSA and IgG remained in the cytoplasm. Taken together, these genetic subtractive and biochemical additive approaches have identified the C-terminal sequence of polyoma-virus VP2 (containing amino acids Glu307-Leu318) as the NLS of this protein.
Enhanced Methods for Local Ancestry Assignment in Sequenced Admixed Individuals

PubMed Central

Brown, Robert; Pasaniuc, Bogdan

2014-01-01

Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs. PMID:24743331
Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

PubMed Central

Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li

2016-01-01

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430
Heuristics for multiobjective multiple sequence alignment.

PubMed

Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

2016-07-15

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
StralSV: assessment of sequence variability within similar 3D structures and application to polio RNA-dependent RNA polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zemla, A; Lang, D; Kostova, T

2010-11-29

Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory - still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could overcome these difficulties and facilitatemore » the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures. Here we present StralSV, a new algorithm for detecting closely related structure fragments and quantifying residue frequency from tight local structure alignments. We apply StralSV in a study of the RNA-dependent RNA polymerase of poliovirus and demonstrate that the algorithm can be used to determine regions of the protein that are relatively unique or that shared structural similarity with structures that are distantly related. By quantifying residue frequencies among many residue-residue pairs extracted from local alignments, one can infer potential structural or functional importance of specific residues that are determined to be highly conserved or that deviate from a consensus. We further demonstrate that considerable detailed structural and phylogenetic information can be derived from StralSV analyses. StralSV is a new structure-based algorithm for identifying and aligning structure fragments that have similarity to a reference protein. StralSV analysis can be used to quantify residue-residue correspondences and identify residues that may be of particular structural or functional importance, as well as unusual or unexpected residues at a given sequence position.« less
ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard A.; Brown, Joseph M.; Colby, Sean M.

ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
Comparative molecular cytogenetics of major repetitive sequence families of three Dendrobium species (Orchidaceae) from Bangladesh

PubMed Central

Begum, Rabeya; Alam, Sheikh Shamimul; Menzel, Gerhard; Schmidt, Thomas

2009-01-01

Background and Aims Dendrobium species show tremendous morphological diversity and have broad geographical distribution. As repetitive sequence analysis is a useful tool to investigate the evolution of chromosomes and genomes, the aim of the present study was the characterization of repetitive sequences from Dendrobium moschatum for comparative molecular and cytogenetic studies in the related species Dendrobium aphyllum, Dendrobium aggregatum and representatives from other orchid genera. Methods In order to isolate highly repetitive sequences, a c0t-1 DNA plasmid library was established. Repeats were sequenced and used as probes for Southern hybridization. Sequence divergence was analysed using bioinformatic tools. Repetitive sequences were localized along orchid chromosomes by fluorescence in situ hybridization (FISH). Key Results Characterization of the c0t-1 library resulted in the detection of repetitive sequences including the (GA)n dinucleotide DmoO11, numerous Arabidopsis-like telomeric repeats and the highly amplified dispersed repeat DmoF14. The DmoF14 repeat is conserved in six Dendrobium species but diversified in representative species of three other orchid genera. FISH analyses showed the genome-wide distribution of DmoF14 in D. moschatum, D. aphyllum and D. aggregatum. Hybridization with the telomeric repeats demonstrated Arabidopsis-like telomeres at the chromosome ends of Dendrobium species. However, FISH using the telomeric probe revealed two pairs of chromosomes with strong intercalary signals in D. aphyllum. FISH showed the terminal position of 5S and 18S–5·8S–25S rRNA genes and a characteristic number of rDNA sites in the three Dendrobium species. Conclusions The repeated sequences isolated from D. moschatum c0t-1 DNA constitute major DNA families of the D. moschatum, D. aphyllum and D. aggregatum genomes with DmoF14 representing an ancient component of orchid genomes. Large intercalary telomere-like arrays suggest chromosomal rearrangements in D. aphyllum while the number and localization of rRNA genes as well as the species-specific distribution pattern of an abundant microsatellite reflect the genomic diversity of the three Dendrobium species. PMID:19635741
Molecular Characterization and Phylogenetic Analysis of Pseudomonas aeruginosa Isolates Recovered from Greek Aquatic Habitats Implementing the Double-Locus Sequence Typing Scheme.

PubMed

Pappa, Olga; Beloukas, Apostolos; Vantarakis, Apostolos; Mavridou, Athena; Kefala, Anastasia-Maria; Galanis, Alex

2017-07-01

The recently described double-locus sequence typing (DLST) scheme implemented to deeply characterize the genetic profiles of 52 resistant environmental Pseudomonas aeruginosa isolates deriving from aquatic habitats of Greece. DLST scheme was able not only to assign an already known allelic profile to the majority of the isolates but also to recognize two new ones (ms217-190, ms217-191) with high discriminatory power. A third locus (oprD) was also used for the molecular typing, which has been found to be fundamental for the phylogenetic analysis of environmental isolates given the resulted increased discrimination between the isolates. Additionally, the circulation of acquired resistant mechanisms in the aquatic habitats according to their genetic profiles was proved to be more extent. Hereby, we suggest that the combination of the DLST to oprD typing can discriminate phenotypically and genetically related environmental P. aeruginosa isolates providing reliable phylogenetic analysis at a local level.

Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

PubMed Central

2014-01-01

Background Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. Results To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. Conclusions By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples. PMID:24475911
[Comparative analysis of variable regions in the genomes of variola virus].

PubMed

Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N

2008-01-01

Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.
Phylogenetic analysis and victim contact tracing of rabies virus from humans and dogs in Bali, Indonesia.

PubMed

Mahardika, G N K; Dibia, N; Budayanti, N S; Susilawathi, N M; Subrata, K; Darwinata, A E; Wignall, F S; Richt, J A; Valdivia-Granda, W A; Sudewi, A A R

2014-06-01

The emergence of human and animal rabies in Bali since November 2008 has attracted local, national and international interest. The potential origin and time of introduction of rabies virus to Bali is described. The nucleoprotein (N) gene of rabies virus from dog brain and human clinical specimens was sequenced using an automated DNA sequencer. Phylogenetic inference with Bayesian Markov Chain Monte Carlo (MCMC) analysis using the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) v. 1.7.5 software confirmed that the outbreak of rabies in Bali was caused by an Indonesian lineage virus following a single introduction. The ancestor of Bali viruses was the descendant of a virus from Kalimantan. Contact tracing showed that the event most likely occurred in early 2008. The introduction of rabies into a large unvaccinated dog population in Bali clearly demonstrates the risk of disease transmission for government agencies and should lead to an increased preparedness and efforts for sustained risk reduction to prevent such events from occurring in future.
Identification, genetic localization, and allelic diversity of selectively amplified microsatellite polymorphic loci in lettuce and wild relatives (Lactuca spp.).

PubMed

Witsenboer, H; Michelmore, R W; Vogel, J

1997-12-01

Selectively amplified microsatellite polymorphic locus (SAMPL) analysis is a method of amplifying microsatellite loci using generic PCR primers. SAMPL analysis uses one AFLP primer in combination with a primer complementary to microsatellite sequences. SAMPL primers based on compound microsatellite sequences provided the clearest amplification patterns. We explored the potential of SAMPL analysis in lettuce to detect PCR-based codominant microsatellite markers. Fifty-eight SAMPLs were identified and placed on the genetic map. Seventeen were codominant. SAMPLs were dispersed with RFLP markers on 11 of the 12 main linkage groups in lettuce, indicating that they have a similar genomic distribution. Some but not all fragments amplified by SAMPL analysis were confirmed to contain microsatellite sequences by Southern hybridization. Forty-five cultivars of lettuce and five wild species of Lactuca were analyzed to determine the allelic diversity for codominant SAMPLs. From 3 to 11 putative alleles were found for each SAMPL; 2-6 alleles were found within Lactuca sativa and 1-3 alleles were found among the crisphead genotypes, the most genetically homogeneous plant type of L. sativa. This allelic diversity is greater than that found for RFLP markers. Numerous new alleles were observed in the wild species; however, there were frequent null alleles. Therefore, SAMPL analysis is more applicable to intraspecific than to interspecific comparisons. A phenetic analysis based on SAMPLs resulted in a dendrogram similar to those based on RFLP and AFLP markers.
The REP2 Repeats of the Genome of Neisseria meningitidis Are Associated with Genes Coordinately Regulated during Bacterial Cell Interaction

PubMed Central

Morelle, Sandrine; Carbonnelle, Etienne; Nassif, Xavier

2003-01-01

Interaction with host cells is essential in meningococcal pathogenesis especially at the blood-brain barrier. This step is likely to involve a common regulatory pathway allowing coordinate regulation of genes necessary for the interaction with endothelial cells. The analysis of the genomic sequence of Neisseria meningitidis Z2491 revealed the presence of many repeats. One of these, designated REP2, contains a −24/−12 type promoter and a ribosome binding site 5 to 13 bp before an ATG. In addition most of these REP2 sequences are located immediately upstream of an ORF. Among these REP2-associated genes are pilC1 and crgA, described as being involved in steps essential for the interaction of N. meningitidis with host cells. Furthermore, the REP2 sequences located upstream of pilC1 and crgA correspond to the previously identified promoters known to be induced during the initial localized adhesion of N. meningitidis with human cells. This characteristic led us to hypothesize that at least some of the REP2-associated genes were upregulated under the same circumstances as pilC1 and crgA. Quantitative PCR in real time demonstrated that the expression of 14 out of 16 REP2-associated genes were upregulated during the initial localized adhesion of N. meningitidis. Taken together, these data suggest that these repeats control a set of genes necessary for the efficient interaction of this pathogen with host cells. Subsequent mutational analysis was performed to address the role of these genes during meningococcus-cell interaction. PMID:12670987
Music Structure Analysis from Acoustic Signals

NASA Astrophysics Data System (ADS)

Dannenberg, Roger B.; Goto, Masataka

Music is full of structure, including sections, sequences of distinct musical textures, and the repetition of phrases or entire sections. The analysis of music audio relies upon feature vectors that convey information about music texture or pitch content. Texture generally refers to the average spectral shape and statistical fluctuation, often reflecting the set of sounding instruments, e.g., strings, vocal, or drums. Pitch content reflects melody and harmony, which is often independent of texture. Structure is found in several ways. Segment boundaries can be detected by observing marked changes in locally averaged texture.
Millstone: software for multiplex microbial genome analysis and engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.

PubMed

Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

2017-05-25

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering

DOE PAGES

Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...

2017-05-25

Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Stability of sequences generated by nonlinear differential systems. [for analysis of glider jet aircraft motion

NASA Technical Reports Server (NTRS)

Brown, R. L.

1979-01-01

A local stability analysis is presented for both the analytic and numerical solutions of the initial value problem for a system of ordinary differential equations. It is shown that, using a proper choice of Liapunov function, a connected region of stable initial values of both the analytic solution and the one-leg k-step numerical solution can be approximated. Attention is given to the example of the two-dimensional problem involving the stability of the longitudinal equations of motion of a gliding jet aircraft.
High-resolution earthquake relocation in the Fort Worth and Permian Basins using regional seismic stations

NASA Astrophysics Data System (ADS)

Ogwari, P.; DeShon, H. R.; Hornbach, M.

2017-12-01

Post-2008 earthquake rate increases in the Central United States have been associated with large-scale subsurface disposal of waste-fluids from oil and gas operations. The beginning of various earthquake sequences in Fort Worth and Permian basins have occurred in the absence of seismic stations at local distances to record and accurately locate hypocenters. Most typically, the initial earthquakes have been located using regional seismic network stations (>100km epicentral distance) and using global 1D velocity models, which usually results in large location uncertainty, especially in depth, does not resolve magnitude <2.5 events, and does not constrain the geometry of the activated fault(s). Here, we present a method to better resolve earthquake occurrence and location using matched filters and regional relative location when local data becomes available. We use the local distance data for high-resolution earthquake location, identifying earthquake templates and accurate source-station raypath velocities for the Pg and Lg phases at regional stations. A matched-filter analysis is then applied to seismograms recorded at US network stations and at adopted TA stations that record the earthquakes before and during the local network deployment period. Positive detections are declared based on manual review of associated with P and S arrivals on local stations. We apply hierarchical clustering to distinguish earthquakes that are both spatially clustered and spatially separated. Finally, we conduct relative earthquake and earthquake cluster location using regional station differential times. Initial analysis applied to the 2008-2009 DFW airport sequence in north Texas results in time continuous imaging of epicenters extending into 2014. Seventeen earthquakes in the USGS earthquake catalog scattered across a 10km2 area near DFW airport are relocated onto a single fault using these approaches. These techniques will also be applied toward imaging recent earthquakes in the Permian Basin near Pecos, TX.
Crystal structure of bacillus subtilis YdaF protein : a putative ribosomal N-acetyltransferase.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brunzelle, J. S.; Wu, R.; Korolev, S. V.

2004-12-01

Comparative sequence analysis suggests that the ydaF gene encodes a protein (YdaF) that functions as an N-acetyltransferase, more specifically, a ribosomal N-acetyltransferase. Sequence analysis using basic local alignment search tool (BLAST) suggests that YdaF belongs to a large family of proteins (199 proteins found in 88 unique species of bacteria, archaea, and eukaryotes). YdaF also belongs to the COG1670, which includes the Escherichia coli RimL protein that is known to acetylate ribosomal protein L12. N-acetylation (NAT) has been found in all kingdoms. NAT enzymes catalyze the transfer of an acetyl group from acetyl-CoA (AcCoA) to a primary amino group. Formore » example, NATs can acetylate the N-terminal {alpha}-amino group, the {epsilon}-amino group of lysine residues, aminoglycoside antibiotics, spermine/speridine, or arylalkylamines such as serotonin. The crystal structure of the alleged ribosomal NAT protein, YdaF, from Bacillus subtilis presented here was determined as a part of the Midwest Center for Structural Genomics. The structure maintains the conserved tertiary structure of other known NATs and a high sequence similarity in the presumed AcCoA binding pocket in spite of a very low overall level of sequence identity to other NATs of known structure.« less
Clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing

PubMed Central

Momeni, Stephanie S.; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A.; Childers, Noel K.

2015-01-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African American children was examined using MLST. Serotype and presence of collagen-binding proteins (CBP) cnm/cbm were also assessed. One hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using START2 and MEGA. Thirty-four sequence types (ST) were identified of which 27 were unique to this population. Seventy-five percent of the isolates clustered into 16 clonal groups. Serotypes observed were c (n=84), e (n=3), and k (n=11). The prevalence of S. mutans isolates serotype k was notably high at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized populations studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study is higher than reported in most populations and is the first report of S. mutans serotype k in a US population. PMID:26443288
Oligonucleotide gap-fill ligation for mutation detection and sequencing in situ

PubMed Central

Mignardi, Marco; Mezger, Anja; Qian, Xiaoyan; La Fleur, Linnea; Botling, Johan; Larsson, Chatarina; Nilsson, Mats

2015-01-01

In clinical diagnostics a great need exists for targeted in situ multiplex nucleic acid analysis as the mutational status can offer guidance for effective treatment. One well-established method uses padlock probes for mutation detection and multiplex expression analysis directly in cells and tissues. Here, we use oligonucleotide gap-fill ligation to further increase specificity and to capture molecular substrates for in situ sequencing. Short oligonucleotides are joined at both ends of a padlock gap probe by two ligation events and are then locally amplified by target-primed rolling circle amplification (RCA) preserving spatial information. We demonstrate the specific detection of the A3243G mutation of mitochondrial DNA and we successfully characterize a single nucleotide variant in the ACTB mRNA in cells by in situ sequencing of RCA products generated by padlock gap-fill ligation. To demonstrate the clinical applicability of our assay, we show specific detection of a point mutation in the EGFR gene in fresh frozen and formalin-fixed, paraffin-embedded (FFPE) lung cancer samples and confirm the detected mutation by in situ sequencing. This approach presents several advantages over conventional padlock probes allowing simpler assay design for multiplexed mutation detection to screen for the presence of mutations in clinically relevant mutational hotspots directly in situ. PMID:26240388
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome

PubMed Central

Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909
High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Off-platform Silurian sequences in the Ambler River quadrangle: A section in Geologic studies in Alaska by the U.S. Geological Survey during 1987

USGS Publications Warehouse

Dumoulin, Julie A.; Harris, Anita G.

1988-01-01

Lithofacies changes in coeval upper Paleozoic rocks have been used to unravel the tectonic history of northern Alaska (for example, Mayfield and others, 1983). Conodont biostratigraphy and detailed petrologic studies are now revealing facies differences in lower Paleozoic rocks that can also be used to constrain their tectono-sedimentary framework (Dumoulin and Harris, 1987). A basic element of basin analysis is the discrimination of shallow-water shelf and platform sequences from deeper water slope and basinal deposits. This report documents several new localities of deeper water, off-platform Silurian deposits in the Ambler River quadrangle and briefly outlines some of their paleogeographic implications.
Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.

PubMed

Bedbrook, Claire N; Yang, Kevin K; Rice, Austin J; Gradinaru, Viviana; Arnold, Frances H

2017-10-01

There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well.
Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization

PubMed Central

Rice, Austin J.; Gradinaru, Viviana; Arnold, Frances H.

2017-01-01

There is growing interest in studying and engineering integral membrane proteins (MPs) that play key roles in sensing and regulating cellular response to diverse external signals. A MP must be expressed, correctly inserted and folded in a lipid bilayer, and trafficked to the proper cellular location in order to function. The sequence and structural determinants of these processes are complex and highly constrained. Here we describe a predictive, machine-learning approach that captures this complexity to facilitate successful MP engineering and design. Machine learning on carefully-chosen training sequences made by structure-guided SCHEMA recombination has enabled us to accurately predict the rare sequences in a diverse library of channelrhodopsins (ChRs) that express and localize to the plasma membrane of mammalian cells. These light-gated channel proteins of microbial origin are of interest for neuroscience applications, where expression and localization to the plasma membrane is a prerequisite for function. We trained Gaussian process (GP) classification and regression models with expression and localization data from 218 ChR chimeras chosen from a 118,098-variant library designed by SCHEMA recombination of three parent ChRs. We use these GP models to identify ChRs that express and localize well and show that our models can elucidate sequence and structure elements important for these processes. We also used the predictive models to convert a naturally occurring ChR incapable of mammalian localization into one that localizes well. PMID:29059183
SONAR: A High-Throughput Pipeline for Inferring Antibody Ontogenies from Longitudinal Sequencing of B Cell Transcripts.

PubMed

Schramm, Chaim A; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R; Kwong, Peter D; Shapiro, Lawrence

2016-01-01

The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity-divergence plots and longitudinal phylogenetic "birthday" trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/.

Imaging Local Ca2+ Signals in Cultured Mammalian Cells

PubMed Central

Lock, Jeffrey T.; Ellefsen, Kyle L.; Settle, Bret; Parker, Ian; Smith, Ian F.

2015-01-01

Cytosolic Ca2+ ions regulate numerous aspects of cellular activity in almost all cell types, controlling processes as wide-ranging as gene transcription, electrical excitability and cell proliferation. The diversity and specificity of Ca2+ signaling derives from mechanisms by which Ca2+ signals are generated to act over different time and spatial scales, ranging from cell-wide oscillations and waves occurring over the periods of minutes to local transient Ca2+ microdomains (Ca2+ puffs) lasting milliseconds. Recent advances in electron multiplied CCD (EMCCD) cameras now allow for imaging of local Ca2+ signals with a 128 x 128 pixel spatial resolution at rates of >500 frames sec-1 (fps). This approach is highly parallel and enables the simultaneous monitoring of hundreds of channels or puff sites in a single experiment. However, the vast amounts of data generated (ca. 1 Gb per min) render visual identification and analysis of local Ca2+ events impracticable. Here we describe and demonstrate the procedures for the acquisition, detection, and analysis of local IP3-mediated Ca2+ signals in intact mammalian cells loaded with Ca2+ indicators using both wide-field epi-fluorescence (WF) and total internal reflection fluorescence (TIRF) microscopy. Furthermore, we describe an algorithm developed within the open-source software environment Python that automates the identification and analysis of these local Ca2+ signals. The algorithm localizes sites of Ca2+ release with sub-pixel resolution; allows user review of data; and outputs time sequences of fluorescence ratio signals together with amplitude and kinetic data in an Excel-compatible table. PMID:25867132
Cytomegalovirus Basic Phosphoprotein (pUL32) Binds to Capsids In Vitro through Its Amino One-Third

PubMed Central

Baxter, Michael K.; Gibson, Wade

2001-01-01

The cytomegalovirus (CMV) basic phosphoprotein (BPP) is a component of the tegument. It remains with the nucleocapsid fraction under conditions that remove most other tegument proteins from the virion, suggesting a direct and perhaps tight interaction with the capsid. As a step toward localizing this protein within the molecular structure of the virion and understanding its function during infection, we have investigated the BPP-capsid interaction. In this report we present evidence that the BPP interacts selectively, through its amino one-third, with CMV capsids. Radiolabeled simian CMV (SCMV) BPP, synthesized in vitro, bound to SCMV B-capsids, and C-capsids to a lesser extent, following incubation with either isolated capsids or lysates of infected cells. Human CMV (HCMV) BPP (pUL32) also bound to SCMV capsids, and SCMV BPP likewise bound to HCMV capsids, indicating that the sequence(s) involved is conserved between the two proteins. Analysis of SCMV BPP truncation mutants localized the capsid-binding region to the amino one-third of the molecule—the portion of BPP showing the greatest sequence conservation between the SCMV and HCMV homologs. This general approach may have utility in studying the interactions of other proteins with conformation-dependent binding sites. PMID:11435566
Molecular characterization of infectious pancreatic necrosis virus strains isolated from the three types of salmonids farmed in Chile.

PubMed

Manríquez, René A; Vera, Tamara; Villalba, Melina V; Mancilla, Alejandra; Vakharia, Vikram N; Yañez, Alejandro J; Cárcamo, Juan G

2017-01-31

The infectious pancreatic necrosis virus (IPNV) causes significant economic losses in Chilean salmon farming. For effective sanitary management, the IPNV strains present in Chile need to be fully studied, characterized, and constantly updated at the molecular level. In this study, 36 Chilean IPNV isolates collected over 6 years (2006-2011) from Salmo salar, Oncorhynchus mykiss, and Oncorhynchus kisutch were genotypically characterized. Salmonid samples were obtained from freshwater, estuary, and seawater sources from central, southern, and the extreme-south of Chile (35° to 53°S). Sequence analysis of the VP2 gene classified 10 IPNV isolates as genogroup 1 and 26 as genogroup 5. Analyses indicated a preferential, but not obligate, relationship between genogroup 5 isolates and S. salar infection. Fifteen genogroup 5 and nine genogroup 1 isolates presented VP2 gene residues associated with high virulence (i.e. Thr, Ala, and Thr at positions 217, 221, and 247, respectively). Four genogroup 5 isolates presented an oddly long VP5 deduced amino acid sequence (29.6 kDa). Analysis of the VP2 amino acid motifs associated with clinical and subclinical infections identified the clinical fingerprint in only genogroup 5 isolates; in contrast, the genogroup 1 isolates presented sequences predominantly associated with the subclinical fingerprint. Predictive analysis of VP5 showed an absence of transmembrane domains and plasma membrane tropism signals. WebLogo analysis of the VP5 BH domains revealed high identities with the marine birnavirus Y-6 and Japanese IPNV strain E1-S. Sequence analysis for putative 25 kDa proteins, coded by the ORF between VP2 and VP4, exhibited three putative nuclear localization sequences and signals of mitochondrial tropism in two isolates. This study provides important advances in updating the characterizations of IPNV strains present in Chile. The results from this study will help in identifying epidemiological links and generating specific biotechnological tools for controlling IPNV outbreaks in Chilean salmon farming.
Clinical applicability and cost of a 46-gene panel for genomic analysis of solid tumours: Retrospective validation and prospective audit in the UK National Health Service.

PubMed

Hamblin, Angela; Wordsworth, Sarah; Fermont, Jilles M; Page, Suzanne; Kaur, Kulvinder; Camps, Carme; Kaisaki, Pamela; Gupta, Avinash; Talbot, Denis; Middleton, Mark; Henderson, Shirley; Cutts, Anthony; Vavoulis, Dimitrios V; Housby, Nick; Tomlinson, Ian; Taylor, Jenny C; Schuh, Anna

2017-02-01

Single gene tests to predict whether cancers respond to specific targeted therapies are performed increasingly often. Advances in sequencing technology, collectively referred to as next generation sequencing (NGS), mean the entire cancer genome or parts of it can now be sequenced at speed with increased depth and sensitivity. However, translation of NGS into routine cancer care has been slow. Healthcare stakeholders are unclear about the clinical utility of NGS and are concerned it could be an expensive addition to cancer diagnostics, rather than an affordable alternative to single gene testing. We validated a 46-gene hotspot cancer panel assay allowing multiple gene testing from small diagnostic biopsies. From 1 January 2013 to 31 December 2013, solid tumour samples (including non-small-cell lung carcinoma [NSCLC], colorectal carcinoma, and melanoma) were sequenced in the context of the UK National Health Service from 351 consecutively submitted prospective cases for which treating clinicians thought the patient had potential to benefit from more extensive genetic analysis. Following histological assessment, tumour-rich regions of formalin-fixed paraffin-embedded (FFPE) sections underwent macrodissection, DNA extraction, NGS, and analysis using a pipeline centred on Torrent Suite software. With a median turnaround time of seven working days, an integrated clinical report was produced indicating the variants detected, including those with potential diagnostic, prognostic, therapeutic, or clinical trial entry implications. Accompanying phenotypic data were collected, and a detailed cost analysis of the panel compared with single gene testing was undertaken to assess affordability for routine patient care. Panel sequencing was successful for 97% (342/351) of tumour samples in the prospective cohort and showed 100% concordance with known mutations (detected using cobas assays). At least one mutation was identified in 87% (296/342) of tumours. A locally actionable mutation (i.e., available targeted treatment or clinical trial) was identified in 122/351 patients (35%). Forty patients received targeted treatment, in 22/40 (55%) cases solely due to use of the panel. Examination of published data on the potential efficacy of targeted therapies showed theoretically actionable mutations (i.e., mutations for which targeted treatment was potentially appropriate) in 66% (71/107) and 39% (41/105) of melanoma and NSCLC patients, respectively. At a cost of £339 (US$449) per patient, the panel was less expensive locally than performing more than two or three single gene tests. Study limitations include the use of FFPE samples, which do not always provide high-quality DNA, and the use of "real world" data: submission of cases for sequencing did not always follow clinical guidelines, meaning that when mutations were detected, patients were not always eligible for targeted treatments on clinical grounds. This study demonstrates that more extensive tumour sequencing can identify mutations that could improve clinical decision-making in routine cancer care, potentially improving patient outcomes, at an affordable level for healthcare providers.
Application of Inter-Simple Sequence Repeat Markers in the Analysis of Populations of the Chagas Disease Vector Triatoma infestans (Hemiptera, Reduviidae)

PubMed Central

Pérez de Rosas, Alicia R.; Restelli, María F.; Fernández, Cintia J.; Blariza, María J.; García, Beatriz A.

2017-01-01

Here we apply inter-simple sequence repeat (ISSR) markers to explore the fine-scale genetic structure and dispersal in populations of Triatoma infestans. Five selected primers from 30 primers were used to amplify ISSRs by polymerase chain reaction. A total of 90 polymorphic bands were detected across 134 individuals captured from 11 peridomestic sites from the locality of San Martín (Capayán Department, Catamarca Province, Argentina). Significant levels of genetic differentiation suggest limited gene flow among sampling sites. Spatial autocorrelation analysis confirms that dispersal occurs on the scale of ∼469 m, suggesting that insecticide spraying should be extended at least within a radius of ∼500 m around the infested area. Moreover, Bayesian clustering algorithms indicated genetic exchange among different sites analyzed, supporting the hypothesis of an important role of peridomestic structures in the process of reinfestation. PMID:28115670
Markerless video analysis for movement quantification in pediatric epilepsy monitoring.

PubMed

Lu, Haiping; Eng, How-Lung; Mandal, Bappaditya; Chan, Derrick W S; Ng, Yen-Ling

2011-01-01

This paper proposes a markerless video analytic system for quantifying body part movements in pediatric epilepsy monitoring. The system utilizes colored pajamas worn by a patient in bed to extract body part movement trajectories, from which various features can be obtained for seizure detection and analysis. Hence, it is non-intrusive and it requires no sensor/marker to be attached to the patient's body. It takes raw video sequences as input and a simple user-initialization indicates the body parts to be examined. In background/foreground modeling, Gaussian mixture models are employed in conjunction with HSV-based modeling. Body part detection follows a coarse-to-fine paradigm with graph-cut-based segmentation. Finally, body part parameters are estimated with domain knowledge guidance. Experimental studies are reported on sequences captured in an Epilepsy Monitoring Unit at a local hospital. The results demonstrate the feasibility of the proposed system in pediatric epilepsy monitoring and seizure detection.
Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study.

PubMed

Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

2006-02-28

Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of "chimera proteins." In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study

PubMed Central

Chikenji, George; Fujitsuka, Yoshimi; Takada, Shoji

2006-01-01

Predicting protein tertiary structure by folding-like simulations is one of the most stringent tests of how much we understand the principle of protein folding. Currently, the most successful method for folding-based structure prediction is the fragment assembly (FA) method. Here, we address why the FA method is so successful and its lesson for the folding problem. To do so, using the FA method, we designed a structure prediction test of “chimera proteins.” In the chimera proteins, local structural preference is specific to the target sequences, whereas nonlocal interactions are only sequence-independent compaction forces. We find that these chimera proteins can find the native folds of the intact sequences with high probability indicating dominant roles of the local interactions. We further explore roles of local structural preference by exact calculation of the HP lattice model of proteins. From these results, we suggest principles of protein folding: For small proteins, compact structures that are fully compatible with local structural preference are few, one of which is the native fold. These local biases shape up the funnel-like energy landscape. PMID:16488978
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

PubMed

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

2011-09-07

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

PubMed Central

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

2011-01-01

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
Global and local pitch perception in children with developmental dyslexia.

PubMed

Ziegler, Johannes C; Pech-Georgel, Catherine; George, Florence; Foxton, Jessica M

2012-03-01

This study investigated global versus local pitch pattern perception in children with dyslexia aged between 8 and 11 years. Children listened to two consecutive 4-tone pitch sequences while performing a same/different task. On the different trials, sequences either preserved the contour (local condition) or they violated the contour (global condition). Compared to normally developing children, dyslexics showed robust pitch perception deficits in the local but not the global condition. This finding was replicated in a simple pitch direction task, which minimizes sequencing and short term memory. Results are consistent with a left-hemisphere deficit in dyslexia because local pitch changes are supposedly processed by the left hemisphere, whereas global pitch changes are processed by the right hemisphere. The present data suggest a link between impaired pitch processing and abnormal phonological development in children with dyslexia, which makes pitch pattern processing a potent tool for early diagnosis and remediation of dyslexia. Copyright Â© 2011 Elsevier Inc. All rights reserved.
Microbial diversity in degraded and non-degraded petroleum samples and comparison across oil reservoirs at local and global scales.

PubMed

Sierra-Garcia, Isabel Natalia; Dellagnezze, Bruna M; Santos, Viviane P; Chaves B, Michel R; Capilla, Ramsés; Santos Neto, Eugenio V; Gray, Neil; Oliveira, Valeria M

2017-01-01

Microorganisms have shown their ability to colonize extreme environments including deep subsurface petroleum reservoirs. Physicochemical parameters may vary greatly among petroleum reservoirs worldwide and so do the microbial communities inhabiting these different environments. The present work aimed at the characterization of the microbiota in biodegraded and non-degraded petroleum samples from three Brazilian reservoirs and the comparison of microbial community diversity across oil reservoirs at local and global scales using 16S rRNA clone libraries. The analysis of 620 16S rRNA bacterial and archaeal sequences obtained from Brazilian oil samples revealed 42 bacterial OTUs and 21 archaeal OTUs. The bacterial community from the degraded oil was more diverse than the non-degraded samples. Non-degraded oil samples were overwhelmingly dominated by gammaproteobacterial sequences with a predominance of the genera Marinobacter and Marinobacterium. Comparisons of microbial diversity among oil reservoirs worldwide suggested an apparent correlation of prokaryotic communities with reservoir temperature and depth and no influence of geographic distance among reservoirs. The detailed analysis of the phylogenetic diversity across reservoirs allowed us to define a core microbiome encompassing three bacterial classes (Gammaproteobacteria, Clostridia, and Bacteroidia) and one archaeal class (Methanomicrobia) ubiquitous in petroleum reservoirs and presumably owning the abilities to sustain life in these environments.
Coincidence of synteny breakpoints with malignancy-related deletions on human chromosome 3

PubMed Central

Kost-Alimova, Maria; Kiss, Hajnalka; Fedorova, Ludmila; Yang, Ying; Dumanski, Jan P.; Klein, George; Imreh, Stefan

2003-01-01

We have found previously that during tumor growth intact human chromosome 3 transferred into tumor cells regularly looses certain 3p regions, among them the ≈1.4-Mb common eliminated region 1 (CER1) at 3p21.3. Fluorescence in situ hybridization analysis of 12 mouse orthologous loci revealed that CER1 splits into two segments in mouse and therefore contains a murine/human conservation breakpoint region (CBR). Several breaks occurred in tumors within the region surrounding the CBR, and this sequence has features that characterize unstable chromosomal regions: deletions in yeast artificial chromosome clones, late replication, gene and segment duplications, and pseudogene insertions. Sequence analysis of the entire 3p12-22 revealed that other cancer-associated deletions (regions eliminated from monochromosomal hybrids carrying an intact chromosome 3 during tumor growth and homozygous deletions found in human tumors) colocalized nonrandomly with murine/human CBRs and were characterized by an increased number of local gene duplications and murine/human conservation mismatches (single genes that do not match into the conserved chromosomal segment). The CBR within CER1 contains a simple tandem TATAGA repeat capable of forming a 40-bp-long secondary hairpin-like structure. This repeat is nonrandomly localized within the other tumor-associated deletions and in the vicinity of 3p12-22 CBRs. PMID:12738884
Coordinate action of distinct sequence elements localizes checkpoint kinase Hsl1 to the septin collar at the bud neck in Saccharomyces cerevisiae.

PubMed

Finnigan, Gregory C; Sterling, Sarah M; Duvalyan, Angela; Liao, Elizabeth N; Sargsyan, Aspram; Garcia, Galo; Nogales, Eva; Thorner, Jeremy

2016-07-15

Passage through the eukaryotic cell cycle requires processes that are tightly regulated both spatially and temporally. Surveillance mechanisms (checkpoints) exert quality control and impose order on the timing and organization of downstream events by impeding cell cycle progression until the necessary components are available and undamaged and have acted in the proper sequence. In budding yeast, a checkpoint exists that does not allow timely execution of the G2/M transition unless and until a collar of septin filaments has properly assembled at the bud neck, which is the site where subsequent cytokinesis will occur. An essential component of this checkpoint is the large (1518-residue) protein kinase Hsl1, which localizes to the bud neck only if the septin collar has been correctly formed. Hsl1 reportedly interacts with particular septins; however, the precise molecular determinants in Hsl1 responsible for its recruitment to this cellular location during G2 have not been elucidated. We performed a comprehensive mutational dissection and accompanying image analysis to identify the sequence elements within Hsl1 responsible for its localization to the septins at the bud neck. Unexpectedly, we found that this targeting is multipartite. A segment of the central region of Hsl1 (residues 611-950), composed of two tandem, semiredundant but distinct septin-associating elements, is necessary and sufficient for binding to septin filaments both in vitro and in vivo. However, in addition to 611-950, efficient localization of Hsl1 to the septin collar in the cell obligatorily requires generalized targeting to the cytosolic face of the plasma membrane, a function normally provided by the C-terminal phosphatidylserine-binding KA1 domain (residues 1379-1518) in Hsl1 but that can be replaced by other, heterologous phosphatidylserine-binding sequences. © 2016 Finnigan et al. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
SCALCE: boosting sequence compression algorithms using locally consistent encoding.

PubMed

Hach, Faraz; Numanagic, Ibrahim; Alkan, Can; Sahinalp, S Cenk

2012-12-01

The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a 'boosting' scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19-when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary data are available at Bioinformatics online.
Shallow phylogeographic structuring of Vimba vimba across Europe suggests two distinct refugia during the last glaciation.

PubMed

Hänfling, B; Dümpelmann, C; Bogutskaya, N G; Brandl, R; Brändle, M

2009-12-01

Genetic variation and geographical structuring of vimba Vimba vimba were analysed across 26 sites (80 individuals) by means of mtDNA sequences (cyt b gene, mitochondrial control region) to localize hypothesized glacial refugia and to reconstruct postglacial recoloniation routes. Although genetic diversity among sequenced individuals was low, a combined analysis of the two sequenced fragments revealed a western (central and northern Europe: Danube, Elbe and lakes of Sweden) and an eastern clade (eastern Europe: Dnieper-South Bug, Don, Neman). Furthermore, a number of divergent ancestral haplotypes distributed around the Black and Caspian Seas became apparent. Mismatch analyses supported a sudden expansion model for the populations of the western clade between 50 and 10 000 bp. Overall, the study provides strong evidence for a northward and westward expansion of V. vimba from two refugial regions located in the Danubian drainage and the northern Pontic regions respectively.
Complete mitogenome sequencing and phylogenetic analysis of PaLi yak (Bos grunniens).

PubMed

Bao, Pengjia; Guo, Xian; Pei, Jie; Liang, Chunnian; Ding, Xuezhi; Min, Chu; Wang, Hongbo; Wu, Xiaoyun; Yan, Ping

2016-11-01

PaLi yak is a very important local breed in China; as a year-round grazing animal, it plays a very important role for the economic and native herdsmen. The PaLi yak complete mitochondrial DNA is sequenced in this study, the total length is 16,324 bp, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a non-coding control region (D-loop region). The order and composition are similar to most of the other vertebrates. The base contents are: 33.72% A, 25.80% C, 13.21% G and 27.27% T; A + T (60.99%) was higher than G + C (39.01%). The phylogenetic relationships were analyzed using the complete mitogenome sequence, results showed that the genetic relationship between yak and cattle is distinct. These information provides useful data for further study on protection of genetic resources and the taxonomy of Bovinae.
STELLAR: fast and exact local alignments

PubMed Central

2011-01-01

Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de. PMID:22151882
Analysis of nucleotide diversity among alleles of the major bacterial blight resistance gene Xa27 in cultivars of rice (Oryza sativa) and its wild relatives.

PubMed

Bimolata, Waikhom; Kumar, Anirudh; Sundaram, Raman Meenakshi; Laha, Gouri Shankar; Qureshi, Insaf Ahmed; Reddy, Gajjala Ashok; Ghazi, Irfan Ahmad

2013-08-01

Xa27 is one of the important R-genes, effective against bacterial blight disease of rice caused by Xanthomonas oryzae pv. oryzae (Xoo). Using natural population of Oryza, we analyzed the sequence variation in the functionally important domains of Xa27 across the Oryza species. DNA sequences of Xa27 alleles from 27 rice accessions revealed higher nucleotide diversity among the reported R-genes of rice. Sequence polymorphism analysis revealed synonymous and non-synonymous mutations in addition to a number of InDels in non-coding regions of the gene. High sequence variation was observed in the promoter region including the 5'UTR with 'π' value 0.00916 and 'θ w ' = 0.01785. Comparative analysis of the identified Xa27 alleles with that of IRBB27 and IR24 indicated the operation of both positive selection (Ka/Ks > 1) and neutral selection (Ka/Ks ≈ 0). The genetic distances of alleles of the gene from Oryza nivara were nearer to IRBB27 as compared to IR24. We also found the presence of conserved and null UPT (upregulated by transcriptional activator) box in the isolated alleles. Considerable amino acid polymorphism was localized in the trans-membrane domain for which the functional significance is yet to be elucidated. However, the absence of functional UPT box in all the alleles except IRBB27 suggests the maintenance of single resistant allele throughout the natural population.
First Report of a Fatal Case Associated with EV-D68 Infection in Hong Kong and Emergence of an Interclade Recombinant in China Revealed by Genome Analysis.

PubMed

Yip, Cyril C Y; Lo, Janice Y C; Sridhar, Siddharth; Lung, David C; Luk, Shik; Chan, Kwok-Hung; Chan, Jasper F W; Cheng, Vincent C C; Woo, Patrick C Y; Yuen, Kwok-Yung; Lau, Susanna K P

2017-05-16

A fatal case associated with enterovirus D68 (EV-D68) infection affecting a 10-year-old boy was reported in Hong Kong in 2014. To examine if a new strain has emerged in Hong Kong, we sequenced the partial genome of the EV-D68 strain identified from the fatal case and the complete VP1, and partial 5'UTR and 2C sequences of nine additional EV-D68 strains isolated from patients in Hong Kong. Sequence analysis indicated that a cluster of strains including the previously recognized A2 strains should belong to a separate clade, clade D, which is further divided into subclades D1 and D2. Among the 10 EV-D68 strains, 7 (including the fatal case) belonged to the previously described, newly emerged subclade B3, 2 belonged to subclade B1, and 1 belonged to subclade D1. Three EV-D68 strains, each from subclades B1, B3, and D1, were selected for complete genome sequencing and recombination analysis. While no evidence of recombination was noted among local strains, interclade recombination was identified in subclade D2 strains detected in mainland China in 2008 with VP2 acquired from clade A. This study supports the reclassification of subclade A2 into clade D1, and demonstrates interclade recombination between clades A and D2 in EV-D68 strains from China.

Niche specialization of terrestrial archaeal ammonia oxidizers.

PubMed

Gubry-Rangin, Cécile; Hai, Brigitte; Quince, Christopher; Engel, Marion; Thomson, Bruce C; James, Phillip; Schloter, Michael; Griffiths, Robert I; Prosser, James I; Nicol, Graeme W

2011-12-27

Soil pH is a major determinant of microbial ecosystem processes and potentially a major driver of evolution, adaptation, and diversity of ammonia oxidizers, which control soil nitrification. Archaea are major components of soil microbial communities and contribute significantly to ammonia oxidation in some soils. To determine whether pH drives evolutionary adaptation and community structure of soil archaeal ammonia oxidizers, sequences of amoA, a key functional gene of ammonia oxidation, were examined in soils at global, regional, and local scales. Globally distributed database sequences clustered into 18 well-supported phylogenetic lineages that dominated specific soil pH ranges classified as acidic (pH <5), acido-neutral (5 ≤ pH <7), or alkalinophilic (pH ≥ 7). To determine whether patterns were reproduced at regional and local scales, amoA gene fragments were amplified from DNA extracted from 47 soils in the United Kingdom (pH 3.5-8.7), including a pH-gradient formed by seven soils at a single site (pH 4.5-7.5). High-throughput sequencing and analysis of amoA gene fragments identified an additional, previously undiscovered phylogenetic lineage and revealed similar pH-associated distribution patterns at global, regional, and local scales, which were most evident for the five most abundant clusters. Archaeal amoA abundance and diversity increased with soil pH, which was the only physicochemical characteristic measured that significantly influenced community structure. These results suggest evolution based on specific adaptations to soil pH and niche specialization, resulting in a global distribution of archaeal lineages that have important consequences for soil ecosystem function and nitrogen cycling.
Transgenes in Mexican maize: molecular evidence and methodological considerations for GMO detection in landrace populations

PubMed Central

PIÑEYRO-NELSON, A; VAN HEERWAARDEN, J; PERALES, H R; SERRATOS-HERNÁNDEZ, J A; RANGEL, A; HUFFORD, M B; GEPTS, P; GARAY-ARROYO, A; RIVERA-BUSTAMANTE, R; ÁLVAREZ-BUYLLA, E R

2009-01-01

A possible consequence of planting genetically modified organisms (GMOs) in centres of crop origin is unintended gene flow into traditional landraces. In 2001, a study reported the presence of the transgenic 35S promoter in maize landraces sampled in 2000 from the Sierra Juarez of Oaxaca, Mexico. Analysis of a large sample taken from the same region in 2003 and 2004 could not confirm the existence of transgenes, thereby casting doubt on the earlier results. These two studies were based on different sampling and analytical procedures and are thus hard to compare. Here, we present new molecular data for this region that confirm the presence of transgenes in three of 23 localities sampled in 2001. Transgene sequences were not detected in samples taken in 2002 from nine localities, while directed samples taken in 2004 from two of the positive 2001 localities were again found to contain transgenic sequences. These findings suggest the persistence or re-introduction of transgenes up until 2004 in this area. We address variability in recombinant sequence detection by analyzing the consistency of current molecular assays. We also present theoretical results on the limitations of estimating the probability of transgene detection in samples taken from landraces. The inclusion of a limited number of female gametes and, more importantly, aggregated transgene distributions may significantly lower detection probabilities. Our analytical and sampling considerations help explain discrepancies among different detection efforts, including the one presented here, and provide considerations for the establishment of monitoring protocols to detect the presence of transgenes among structured populations of landraces. PMID:19143938
Transgenes in Mexican maize: molecular evidence and methodological considerations for GMO detection in landrace populations.

PubMed

Piñeyro-Nelson, A; Van Heerwaarden, J; Perales, H R; Serratos-Hernández, J A; Rangel, A; Hufford, M B; Gepts, P; Garay-Arroyo, A; Rivera-Bustamante, R; Alvarez-Buylla, E R

2009-02-01

A possible consequence of planting genetically modified organisms (GMOs) in centres of crop origin is unintended gene flow into traditional landraces. In 2001, a study reported the presence of the transgenic 35S promoter in maize landraces sampled in 2000 from the Sierra Juarez of Oaxaca, Mexico. Analysis of a large sample taken from the same region in 2003 and 2004 could not confirm the existence of transgenes, thereby casting doubt on the earlier results. These two studies were based on different sampling and analytical procedures and are thus hard to compare. Here, we present new molecular data for this region that confirm the presence of transgenes in three of 23 localities sampled in 2001. Transgene sequences were not detected in samples taken in 2002 from nine localities, while directed samples taken in 2004 from two of the positive 2001 localities were again found to contain transgenic sequences. These findings suggest the persistence or re-introduction of transgenes up until 2004 in this area. We address variability in recombinant sequence detection by analyzing the consistency of current molecular assays. We also present theoretical results on the limitations of estimating the probability of transgene detection in samples taken from landraces. The inclusion of a limited number of female gametes and, more importantly, aggregated transgene distributions may significantly lower detection probabilities. Our analytical and sampling considerations help explain discrepancies among different detection efforts, including the one presented here, and provide considerations for the establishment of monitoring protocols to detect the presence of transgenes among structured populations of landraces.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE PAGES

Yim, Won Cheol; Cushman, John C.

2017-07-22

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Partial characterization of normal and Haemophilus influenzae-infected mucosal complementary DNA libraries in chinchilla middle ear mucosa.

PubMed

Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D

2010-04-01

We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.
Partial Characterization of Normal and Haemophilus influenzae–Infected Mucosal Complementary DNA Libraries in Chinchilla Middle Ear Mucosa

PubMed Central

Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.

2010-01-01

Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028
A novel mutation in PRPF31, causative of autosomal dominant retinitis pigmentosa, using the BGISEQ-500 sequencer

PubMed Central

Zheng, Yu; Wang, Hai-Lin; Li, Jian-Kang; Xu, Li; Tellier, Laurent; Li, Xiao-Lin; Huang, Xiao-Yan; Li, Wei; Niu, Tong-Tong; Yang, Huan-Ming; Zhang, Jian-Guo; Liu, Dong-Ning

2018-01-01

AIM To study the genes responsible for retinitis pigmentosa. METHODS A total of 15 Chinese families with retinitis pigmentosa, containing 94 sporadically afflicted cases, were recruited. The targeted sequences were captured using the Target_Eye_365_V3 chip and sequenced using the BGISEQ-500 sequencer, according to the manufacturer's instructions. Data were aligned to UCSC Genome Browser build hg19, using the Burroughs Wheeler Aligner MEM algorithm. Local realignment was performed with the Genome Analysis Toolkit (GATK v.3.3.0) IndelRealigner, and variants were called with the Genome Analysis Toolkit Haplotypecaller, without any use of imputation. Variants were filtered against a panel derived from 1000 Genomes Project, 1000G_ASN, ESP6500, ExAC and dbSNP138. In all members of Family ONE and Family TWO with available DNA samples, the genetic variant was validated using Sanger sequencing. RESULTS A novel, pathogenic variant of retinitis pigmentosa, c.357_358delAA (p.Ser119SerfsX5) was identified in PRPF31 in 2 of 15 autosomal-dominant retinitis pigmentosa (ADRP) families, as well as in one, sporadic case. Sanger sequencing was performed upon probands, as well as upon other family members. This novel, pathogenic genotype co-segregated with retinitis pigmentosa phenotype in these two families. CONCLUSION ADRP is a subtype of retinitis pigmentosa, defined by its genotype, which accounts for 20%-40% of the retinitis pigmentosa patients. Our study thus expands the spectrum of PRPF31 mutations known to occur in ADRP, and provides further demonstration of the applicability of the BGISEQ500 sequencer for genomics research. PMID:29375987
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yim, Won Cheol; Cushman, John C.

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
First Report on Circulation of Echinococcus ortleppi in the one Humped Camel (Camelus dromedaries), Sudan

PubMed Central

2013-01-01

Background Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. Results The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). Conclusions This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time. PMID:23800362
First report on circulation of Echinococcus ortleppi in the one humped camel (Camelus dromedaries), Sudan.

PubMed

Ahmed, Mohamed E; Eltom, Kamal H; Musa, Nasreen O; Ali, Ibtisam A; Elamin, Fatima M; Grobusch, Martin P; Aradaib, Imadeldin E

2013-06-25

Echinococcus granulosus (EG) complex, the cause of cystic echinococcosis (CE), infects humans and several other animal species worldwide and hence the disease is of public health importance. Ten genetic variants, or genotypes designated as (G1-G10), are distributed worldwide based on genetic diversity. The objective of this study was to provide some sequence data and phylogeny of EG isolates recovered from the Sudanese one-humped camel (Camelus dromedaries). Fifty samples of hydatid cysts were collected from the one- humped camels (Camelus dromedaries) at Taboul slaughter house, central Sudan. DNAs were extracted from protoscolices and/or associated germinal layers of hydatid cysts using a commercial kit. The mitochondrial NADH dehydrogenase subunit 1 (NADH1) gene and the cytochrome C oxidase subunit 1 (cox1) gene were used as targets for polymerase chain reaction (PCR) amplification. The PCR products were purified and partial sequences were generated. Sequences were further examined by sequence analysis and subsequent phylogeny to compare these sequences to those from known strains of EG circulating globally. The identity of the PCR products were confirmed as NADH1 and cox1 nucleotide sequences using the Basic Local Alignment Search Tool (BLAST) of NCBI (National Center for Biotechnology Information, Bethesda, MD). The phylogenetic analysis showed that 98% (n = 49) of the isolates clustered with Echinococcus canadensis genotype 6 (G6), whereas only one isolate (2%) clustered with Echinococcus ortleppi (G5). This investigation expands on the existing sequence data generated from EG isolates recovered from camel in the Sudan. The circulation of the cattle genotype (G5) in the one-humped camel is reported here for the first time.
Localization of the human {beta}-catenin gene (CTNNB1) to 3p21: A region implicated in tumor development

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kraus, C.; Liehr, T.; Ballhausen, G.

1994-09-01

The human {beta}-catenin locus (CTNNB1) was mapped by in situ fluorescence analysis to band p21 on the short arm of chromosome 3, a region frequently affected by somatic alterations in a variety of tumors. PCR primers for the genomic amplification of {beta}-catenin sequences were selected on the basis of homology to exon 4 of the Drosophila armadillo gene. Analysis of a panel of somatic cell hybrids confirmed the localization of {beta}-catenin on human chromosome 3. Furthermore, exclusion mapping of three hybrids carrying defined fragments of the short arm of human chromosome 3 allowed us to determine the position of themore » CTNNB1 locus close to the marker D3S2 in 3p21. 22 refs., 3 figs.« less
Importation and co-circulation of multiple serotypes of dengue virus in Sarawak, Malaysia.

PubMed

Holmes, Edward C; Tio, Phaik-Hooi; Perera, David; Muhi, Jamail; Cardosa, Jane

2009-07-01

Although dengue is a common disease in South-East Asia, there is a marked absence of virological data from the Malaysian state of Sarawak located on the island of Borneo. From 1997 to 2002 we noted the co-circulation of DENV-2, DENV-3 and DENV-4 in Sarawak. To determine the origins of these Sarawak viruses we obtained the complete E gene sequences of 21 isolates. A phylogenetic analysis revealed multiple entries of DENV-2 and DENV-4 into Sarawak, such that multiple lineages co-circulate, yet with little exportation from Sarawak. Notably, all viral isolates were most closely related to those circulating in different localities in South-East Asia. In sum, our analysis reveals a frequent traffic of DENV in South-East Asia, with Sarawak representing a local sink population.
Unique mitochondrial localization of arginase 1 and 2 in hepatocytes of air-breathing walking catfish, Clarias batrachus and their differential expression patterns under hyper-ammonia stress.

PubMed

Banerjee, Bodhisattwa; Koner, Debaprasad; Lal, Priyanka; Saha, Nirmalendu

2017-07-30

Arginase (ARG) catalyzes the final step of ornithine-urea cycle (OUC) leading to a conversion of L-arginine to L-ornithine and urea. Several isoforms of ARG have been reported in vertebrates, out of which the two predominant isoforms are the cytosolic ARG1 and the mitochondrial ARG2. The air-breathing walking catfish (Clarias batrachus) is frequently being challenged by different environmental insults such as hyper-ammonia, dehydration and osmotic stresses in their natural habitats throughout the year. The present study investigated the active presence of ARG1 and ARG2 isoforms in hepatocytes along with unique localization of both the isoforms inside the mitochondria, and also their specific expression patterns under hyper-ammonia stress (5mM NH 4 Cl) in isolated hepatocytes of walking catfish. Initially, full length sequences of both arg1 and arg2 genes were obtained by RACE-PCR. Studies on molecular characterization demonstrated the presence of all the conserved amino acids required for stability and activity of binuclear metal center in both the isoforms. Phylogenetic analysis of the amino acid sequences of ARG isoforms showed a differentiation of the ARG1 and ARG2 into two distinct clusters with their respective isoforms from other species. Most interestingly, both the isoforms of ARG in hepatocytes were found to be localized inside the mitochondria as evidenced by the presence of mitochondrial target peptide (mTP) in N-terminal of the derived amino acid sequences, and exclusive localization of ARG activity in the mitochondrial fraction. This was additionally confirmed by Western blot analysis of ARGs in mitochondrial and cytosolic fractions, and by immunocytochemical analysis in isolated hepatocytes. Although the possible reasons associated with the presence of both the isoforms of ARGs inside the mitochondria is not clearly understood, perhaps this mitochondrial localization of ARG is functionally advantageous in this catfish for the synthesis of N-acetyl-l-glutamate, the allosteric regulator for the first OUC enzyme, the carbamoyl phosphate synthetase III, and for supplying ornithine required for citrulline synthesis intramitochondrially. Furthermore, the ammonia stress, due to exposure to high external ammonia, led to greater synthesis of urea-N probably as a consequence of induction of ureogenesis, as evidenced by a larger accumulation of urea-N in hepatocytes and higher secretion in culture media parallel to the increased concentration of ammonia-N in hepatocytes. Ammonia stress also led to specific coordinated patterns of induction of both the arg genes in isolated hepatocytes of walking catfish. Copyright © 2017 Elsevier B.V. All rights reserved.
Genetic variability in Melipona quinquefasciata (Hymenoptera, Apidae, Meliponini) from northeastern Brazil determined using the first internal transcribed spacer (ITS1).

PubMed

Pereira, J O P; Freitas, B M; Jorge, D M M; Torres, D C; Soares, C E A; Grangeiro, T B

2009-01-01

Melipona quinquefasciata is a ground-nesting South American stingless bee whose geographic distribution was believed to comprise only the central and southern states of Brazil. We obtained partial sequences (about 500-570 bp) of first internal transcribed spacer (ITS1) nuclear ribosomal DNA from Melipona specimens putatively identified as M. quinquefasciata collected from different localities in northeastern Brazil. To confirm the taxonomic identity of the northeastern samples, specimens from the state of Goiás (Central region of Brazil) were included for comparison. All sequences were deposited in GenBank (accession numbers EU073751-EU073759). The mean nucleotide divergence (excluding sites with insertions/deletions) in the ITS1 sequences was only 1.4%, ranging from 0 to 4.1%. When the sites with insertions/deletions were also taken into account, sequence divergences varied from 0 to 5.3%. In all pairwise comparisons, the ITS1 sequence from the specimens collected in Goiás was most divergent compared to the ITS1 sequences of the bees from the other locations. However, neighbor-joining phylogenetic analysis showed that all ITS1 sequences from northeastern specimens along with the sample of Goiás were resolved in a single clade with a bootstrap support of 100%. The ITS1 sequencing data thus support the occurrence of M. quinquefasciata in northeast Brazil.
A dictionary based informational genome analysis

PubMed Central

2012-01-01

Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

PubMed

Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

2018-04-01

Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Analysis of complex repeat sequences within the spinal muscular atrophy (SMA) candidate region in 5q13

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davies, K.E.; Morrison, K.E.; Daniels, R.I.

1994-09-01

We previously reported that the 400 kb interval flanked the polymorphic loci D5S435 and D5S557 contains blocks of a chromosome 5 specific repeat. This interval also defines the SMA candidate region by genetic analysis of recombinant families. A YAC contig of 2-3 Mb encompassing this area has been constructed and a 5.5 kb conserved fragment, isolated from a YAC end clone within the above interval, was used to obtain cDNAs from both fetal and adult brain libraries. We describe the identification of cDNAs with stretches of high DNA sequence homology to exons of {beta} glucuronidase on human chromosome 7. Themore » cDNAs map both to the candidate region and to an area of 5p using FISH and deletion hybrid analysis. Hybridization to bacteriophage and cosmid clones from the YACs localizes the {beta} glucuronidase related sequences within the 400 kb region of the YAC contig. The cDNAs show a polymorphic pattern on hybridization to genomic BamH1 fragments in the size range of 10-250 kb. Further analysis using YAC fragmentation vectors is being used to determine how these {beta} glucuronidase related cDNAs are distributed within 5q13. Dinucleotide repeats within the region are being investigated to determine linkage disequilibrium with the disease locus.« less
Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

PubMed

Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

2017-04-01

Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.
SPATIALLY RESOLVED STAR FORMATION MAIN SEQUENCE OF GALAXIES IN THE CALIFA SURVEY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cano-Díaz, M.; Sánchez, S. F.; Zibetti, S.

2016-04-20

The “main sequence of galaxies”–defined in terms of the total star formation rate ψ versus the total stellar mass M {sub *}—is a well-studied tight relation that has been observed at several wavelengths and at different redshifts. All earlier studies have derived this relation from integrated properties of galaxies. We recover the same relation from an analysis of spatially resolved properties, with integral field spectroscopic (IFS) observations of 306 galaxies from the CALIFA survey. We consider the SFR surface density in units of log( M {sub ⊙} yr{sup −1} Kpc{sup −2}) and the stellar mass surface density in units ofmore » log( M {sub ⊙} Kpc{sup −2}) in individual spaxels that probe spatial scales of 0.5–1.5 Kpc. This local relation exhibits a high degree of correlation with small scatter ( σ = 0.23 dex), irrespective of the dominant ionization source of the host galaxy or its integrated stellar mass. We highlight (i) the integrated star formation main sequence formed by galaxies whose dominant ionization process is related to star formation, for which we find a slope of 0.81 ± 0.02; (ii) for the spatially resolved relation obtained with the spaxel analysis, we find a slope of 0.72 ± 0.04; and (iii) for the integrated main sequence, we also identified a sequence formed by galaxies that are dominated by an old stellar population, which we have called the retired galaxies sequence.« less
A cross-polarization based rotating-frame separated-local-field NMR experiment under ultrafast MAS conditions

NASA Astrophysics Data System (ADS)

Zhang, Rongchun; Damron, Joshua; Vosegaard, Thomas; Ramamoorthy, Ayyalusamy

2015-01-01

Rotating-frame separated-local-field solid-state NMR experiments measure highly resolved heteronuclear dipolar couplings which, in turn, provide valuable interatomic distances for structural and dynamic studies of molecules in the solid-state. Though many different rotating-frame SLF sequences have been put forth, recent advances in ultrafast MAS technology have considerably simplified pulse sequence requirements due to the suppression of proton-proton dipolar interactions. In this study we revisit a simple two-dimensional 1H-13C dipolar coupling/chemical shift correlation experiment using 13C detected cross-polarization with a variable contact time (CPVC) and systematically study the conditions for its optimal performance at 60 kHz MAS. In addition, we demonstrate the feasibility of a proton-detected version of the CPVC experiment. The theoretical analysis of the CPVC pulse sequence under different Hartmann-Hahn matching conditions confirms that it performs optimally under the ZQ (w1H - w1C = ±wr) condition for polarization transfer. The limits of the cross polarization process are explored and precisely defined as a function of offset and Hartmann-Hahn mismatch via spin dynamics simulation and experiments on a powder sample of uniformly 13C-labeled L-isoleucine. Our results show that the performance of the CPVC sequence and subsequent determination of 1H-13C dipolar couplings are insensitive to 1H/13C frequency offset frequency when high RF fields are used on both RF channels. Conversely, the CPVC sequence is quite sensitive to the Hartmann-Hahn mismatch, particularly for systems with weak heteronuclear dipolar couplings. We demonstrate the use of the CPVC based SLF experiment as a tool to identify different carbon groups, and hope to motivate the exploration of more sophisticated 1H detected avenues for ultrafast MAS.

Variations in Nuclear Localization Strategies Among Pol X Family Enzymes.

PubMed

Kirby, Thomas W; Pedersen, Lars C; Gabel, Scott A; Gassman, Natalie R; London, Robert E

2018-06-22

Despite the essential roles of pol X family enzymes in DNA repair, information about the structural basis of their nuclear import is limited. Recent studies revealed the unexpected presence of a functional NLS in DNA polymerase β, indicating the importance of active nuclear targeting, even for enzymes likely to leak into and out of the nucleus. The current studies further explore the active nuclear transport of these enzymes by identifying and structurally characterizing the functional NLS sequences in the three remaining human pol X enzymes: terminal deoxynucleotidyl transferase (TdT), DNA polymerase μ (pol μ), and DNA polymerase λ (pol λ). NLS identifications are based on Importin α (Impα) binding affinity determined by fluorescence polarization of fluorescein-labeled NLS peptides, X-ray crystallographic analysis of the Impα∆IBB•NLS complexes, and fluorescence-based subcellular localization studies. All three polymerases use NLS sequences located near their N-terminus; TdT and pol μ utilize monopartite NLS sequences, while pol λ utilizes a bipartite sequence, unique among the pol X family members. The pol μ NLS has relatively weak measured affinity for Impα, due in part to its proximity to the N-terminus that limits non-specific interactions of flanking residues preceding the NLS. However, this effect is partially mitigated by an N-terminal sequence unsupportive of Met1 removal by methionine aminopeptidase, leading to a 3-fold increase in affinity when the N-terminal methionine is present. Nuclear targeting is unique to each pol X family enzyme with variations dependent on the structure and unique functional role of each polymerase. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Molecular cloning and characterization of satellite DNA sequences from constitutive heterochromatin of the habu snake (Protobothrops flavoviridis, Viperidae) and the Burmese python (Python bivittatus, Pythonidae).

PubMed

Matsubara, Kazumi; Uno, Yoshinobu; Srikulnath, Kornsorn; Seki, Risako; Nishida, Chizuko; Matsuda, Yoichi

2015-12-01

Highly repetitive DNA sequences of the centromeric heterochromatin provide valuable molecular cytogenetic markers for the investigation of genomic compartmentalization in the macrochromosomes and microchromosomes of sauropsids. Here, the relationship between centromeric heterochromatin and karyotype evolution was examined using cloned repetitive DNA sequences from two snake species, the habu snake (Protobothrops flavoviridis, Crotalinae, Viperidae) and Burmese python (Python bivittatus, Pythonidae). Three satellite DNA (stDNA) families were isolated from the heterochromatin of these snakes: 168-bp PFL-MspI from P. flavoviridis and 196-bp PBI-DdeI and 174-bp PBI-MspI from P. bivittatus. The PFL-MspI and PBI-DdeI sequences were localized to the centromeric regions of most chromosomes in the respective species, suggesting that the two sequences were the major components of the centromeric heterochromatin in these organisms. The PBI-MspI sequence was localized to the pericentromeric region of four chromosome pairs. The PFL-MspI and the PBI-DdeI sequences were conserved only in the genome of closely related species, Gloydius blomhoffii (Crotalinae) and Python molurus, respectively, although their locations on the chromosomes were slightly different. In contrast, the PBI-MspI sequence was also in the genomes of P. molurus and Boa constrictor (Boidae), and additionally localized to the centromeric regions of eight chromosome pairs in B. constrictor, suggesting that this sequence originated in the genome of a common ancestor of Pythonidae and Boidae, approximately 86 million years ago. The three stDNA sequences showed no genomic compartmentalization between the macrochromosomes and microchromosomes, suggesting that homogenization of the centromeric and/or pericentromeric stDNA sequences occurred in the macrochromosomes and microchromosomes of these snakes.
TABLE D - WMO AND LOCAL (NCEP) DESCRIPTORS AS WELL AS THOSE AWAITING

Science.gov Websites

sequences common to satellite observations None 3 05 Meteorological or hydrological sequences common to Vertical sounding sequences (conventional data) None 3 10 Vertical sounding sequences (satellite data) None (satellite data) None 3 13 Sequences common to image data None 3 14 Reserved None 3 15 Oceanographic report
Whole genome characterization of human influenza A(H1N1)pdm09 viruses isolated from Kenya during the 2009 pandemic.

PubMed

Gachara, George; Symekher, Samuel; Otieno, Michael; Magana, Japheth; Opot, Benjamin; Bulimo, Wallace

2016-06-01

An influenza pandemic caused by a novel influenza virus A(H1N1)pdm09 spread worldwide in 2009 and is estimated to have caused between 151,700 and 575,400 deaths globally. While whole genome data on new virus enables a deeper insight in the pathogenesis, epidemiology, and drug sensitivities of the circulating viruses, there are relatively limited complete genetic sequences available for this virus from African countries. We describe herein the full genome analysis of influenza A(H1N1)pdm09 viruses isolated in Kenya between June 2009 and August 2010. A total of 40 influenza A(H1N1)pdm09 viruses isolated during the pandemic were selected. The segments from each isolate were amplified and directly sequenced. The resulting sequences of individual gene segments were concatenated and used for subsequent analysis. These were used to infer phylogenetic relationships and also to reconstruct the time of most recent ancestor, time of introduction into the country, rates of substitution and to estimate a time-resolved phylogeny. The Kenyan complete genome sequences clustered with globally distributed clade 2 and clade 7 sequences but local clade 2 viruses did not circulate beyond the introductory foci while clade 7 viruses disseminated country wide. The time of the most recent common ancestor was estimated between April and June 2009, and distinct clusters circulated during the pandemic. The complete genome had an estimated rate of nucleotide substitution of 4.9×10(-3) substitutions/site/year and greater diversity in surface expressed proteins was observed. We show that two clades of influenza A(H1N1)pdm09 virus were introduced into Kenya from the UK and the pandemic was sustained as a result of importations. Several closely related but distinct clusters co-circulated locally during the peak pandemic phase but only one cluster dominated in the late phase of the pandemic suggesting that it possessed greater adaptability. Copyright © 2016 Elsevier B.V. All rights reserved.
Application of next generation sequencing toward sensitive detection of enteric viruses isolated from celery samples as an example of produce.

PubMed

Yang, Zhihui; Mammel, Mark; Papafragkou, Efstathia; Hida, Kaoru; Elkins, Christopher A; Kulka, Michael

2017-11-16

Next generation sequencing (NGS) holds promise as a single application for both detection and sequence identification of foodborne viruses; however, technical challenges remain due to anticipated low quantities of virus in contaminated food. In this study, with a focus on data analysis using several bioinformatics tools, we applied NGS toward amplification-independent detection and identification of norovirus at low copy (<10 3 copies) or within multiple strains from produce. Celery samples were inoculated with human norovirus (stool suspension) either as a single norovirus strain, a mixture of strains (GII.4 and GII.6), or a mixture of different species (hepatitis A virus and norovirus). Viral RNA isolation and recovery was confirmed by RT-qPCR, and optimized for library generation and sequencing without amplification using the Illumina MiSeq platform. Extracts containing either a single virus or a two-virus mixture were analyzed using two different analytic approaches to achieve virus detection and identification. First an overall assessment of viral genome coverage for samples varying in copy numbers (1.1×10 3 to 1.7×10 7 ) and genomic content (single or multiple strains in various ratios) was completed by reference-guided mapping. Not unexpectedly, this targeted approach to identification was successful in correctly mapping reads, thus identifying each virus contained in the inoculums even at low copy (estimated at 12 copies). For the second (metagenomic) approach, samples were treated as "unknowns" for data analyses using (i) a sequence-based alignment with a local database, (ii) an "in-house" k-mer tool, (iii) a commercially available metagenomics bioinformatic analysis platform cosmosID, and (iv) an open-source program Kraken. Of the four metagenomics tools applied in this study, only the local database alignment and in-house k-mer tool were successful in detecting norovirus (as well as HAV) at low copy (down to <10 3 copies) and within a mixture of virus strains or species. The results of this investigation provide support for continued investigation into the development and integration of these analytical tools for identification and detection of foodborne viruses. Published by Elsevier B.V.
The red-sequence of 72 WINGS local galaxy clusters

NASA Astrophysics Data System (ADS)

Valentinuzzi, T.; Poggianti, B. M.; Fasano, G.; D'Onofrio, M.; Moretti, A.; Ramella, M.; Biviano, A.; Fritz, J.; Varela, J.; Bettoni, D.; Vulcani, B.; Moles, M.; Couch, W. J.; Dressler, A.; Kjærgaard, P.; Omizzolo, A.; Cava, A.

2011-12-01

We study the color - magnitude red sequence and blue fraction of 72 X-ray selected galaxy clusters at z = 0.04-0.07 from the WINGS survey, searching for correlations between the characteristics of the red sequence (RS) and the environment. We consider the slope and scatter of the red sequence, the number ratio of red luminous-to-faint galaxies, the blue fraction, and the fractions of ellipticals, S0s, and spirals that compose the RS. None of these quantities correlate with the cluster velocity dispersion, X-ray luminosity, number of cluster substructures, BCG prevalence over next brightest galaxies, and the spatial concentration of ellipticals. The properties of the RS, instead, depend strongly on local galaxy density. Higher density regions have a smaller RS scatter, a higher luminous-to-faint ratio, a lower blue fraction, and a lower spiral fraction on the RS. Our results clearly illustrate the prominent effect of the local density in setting the epoch when galaxies become passive and join the red sequence, as opposed to the mass of the galaxy host structure.
Sequence-Specific Targeting of Dosage Compensation in Drosophila Favors an Active Chromatin Context

PubMed Central

Gelbart, Marnie; Tolstorukov, Michael Y.; Plachetka, Annette; Kharchenko, Peter V.; Jung, Youngsook L.; Gorchakov, Andrey A.; Larschan, Erica; Gu, Tingting; Minoda, Aki; Riddle, Nicole C.; Schwartz, Yuri B.; Elgin, Sarah C. R.; Karpen, Gary H.; Pirrotta, Vincenzo; Kuroda, Mitzi I.; Park, Peter J.

2012-01-01

The Drosophila MSL complex mediates dosage compensation by increasing transcription of the single X chromosome in males approximately two-fold. This is accomplished through recognition of the X chromosome and subsequent acetylation of histone H4K16 on X-linked genes. Initial binding to the X is thought to occur at “entry sites” that contain a consensus sequence motif (“MSL recognition element” or MRE). However, this motif is only ∼2 fold enriched on X, and only a fraction of the motifs on X are initially targeted. Here we ask whether chromatin context could distinguish between utilized and non-utilized copies of the motif, by comparing their relative enrichment for histone modifications and chromosomal proteins mapped in the modENCODE project. Through a comparative analysis of the chromatin features in male S2 cells (which contain MSL complex) and female Kc cells (which lack the complex), we find that the presence of active chromatin modifications, together with an elevated local GC content in the surrounding sequences, has strong predictive value for functional MSL entry sites, independent of MSL binding. We tested these sites for function in Kc cells by RNAi knockdown of Sxl, resulting in induction of MSL complex. We show that ectopic MSL expression in Kc cells leads to H4K16 acetylation around these sites and a relative increase in X chromosome transcription. Collectively, our results support a model in which a pre-existing active chromatin environment, coincident with H3K36me3, contributes to MSL entry site selection. The consequences of MSL targeting of the male X chromosome include increase in nucleosome lability, enrichment for H4K16 acetylation and JIL-1 kinase, and depletion of linker histone H1 on active X-linked genes. Our analysis can serve as a model for identifying chromatin and local sequence features that may contribute to selection of functional protein binding sites in the genome. PMID:22570616
Phylogenetic analysis of canine distemper virus in South America clade 1 reveals unique molecular signatures of the local epidemic.

PubMed

Fischer, Cristine D B; Gräf, Tiago; Ikuta, Nilo; Lehmann, Fernanda K M; Passos, Daniel T; Makiejczuk, Aline; Silveira, Marcos A T; Fonseca, André S K; Canal, Cláudio W; Lunge, Vagner R

2016-07-01

Canine distemper virus (CDV) is a highly contagious pathogen for domestic dogs and several wild carnivore species. In Brazil, natural infection of CDV in dogs is very high due to the large non-vaccinated dog population, a scenario that calls for new studies on the molecular epidemiology. This study investigates the phylodynamics and amino-acid signatures of CDV epidemic in South America by analyzing a large dataset compiled from publicly available sequences and also by collecting new samples from Brazil. A population of 175 dogs with canine distemper (CD) signs was sampled, from which 89 were positive for CDV, generating 42 new CDV sequences. Phylogenetic analysis of the new and publicly available sequences revealed that Brazilian sequences mainly clustered in South America 1 (SA1) clade, which has its origin estimated to the late 1980's. The reconstruction of the demographic history in SA1 clade showed an epidemic expanding until the recent years, doubling in size every nine years. SA1 clade epidemic distinguished from the world CDV epidemic by the emergence of the R580Q strain, a very rare and potentially detrimental substitution in the viral genome. The R580Q substitution was estimated to have happened in one single evolutionary step in the epidemic history in SA1 clade, emerging shortly after introduction to the continent. Moreover, a high prevalence (11.9%) of the Y549H mutation was observed among the domestic dogs sampled here. This finding was associated (p<0.05) with outcome-death and higher frequency in mixed-breed dogs, the later being an indicator of a continuous exchange of CDV strains circulating among wild carnivores and domestic dogs. The results reported here highlight the diversity of the worldwide CDV epidemic and reveal local features that can be valuable for combating the disease. Copyright © 2016 Elsevier B.V. All rights reserved.
Characterization of the synthesis and expression of the GTA-kinase from transformed and normal rodent cells.

PubMed

Kerr, M; Fischer, J E; Purushotham, K R; Gao, D; Nakagawa, Y; Maeda, N; Ghanta, V; Hiramoto, R; Chegini, N; Humphreys-Beher, M G

1994-08-02

The murine transformed cell line YC-8 and beta-adrenergic receptor agonist (isoproternol) treated rat and mouse parotid gland acinar cells ectopically express cell surface beta 1-4 galactosyltransferase during active proliferation. This activity is dependent upon the expression of the GTA-kinase (p58) in these cells. Using total RNA, cDNA clones for the protein coding region of the kinase were isolated by reverse transcriptase-PCR cloning. DNA sequence analysis failed to show sequence differences with the normal homolog from mouse cells although Southern blot analysis of YC-8, and a second cell line KI81, indicated changes in the restriction enzyme digestion profile relative to murine cell lines which do not express cell surface galactosyltransferase. The rat cDNA clone from isoproterenol-treated salivary glands showed a high degree of protein and nucleic acid sequence homology to the GTA-kinase from both murine and human sources. Northern blot analysis of YC-8 and a control cell line LSTRA revealed the synthesis of a major 3.0 kb mRNA from both cell lines plus the unique expression of a 4.5 kb mRNA in the YC-8 cells. Reverse transcriptase-PCR of LSTRA and YC-8 confirmed the increased steady state levels of the GTA-kinase mRNA in YC-8. In the mouse, induction of cell proliferation by isoproterenol resulted in a 50-fold increase in steady state mRNA levels for the kinase over the low level of expression in quiescent cells. Expression of the rat 3' untranslated region in rat parotid cells in vitro led to an increased rate of DNA synthesis, cell number an ectopic expression of cell surface galactosyltransferase in the sense orientation. Antisense expression or vector alone did not alter growth characteristics of acinar cells. A polyclonal antibody monospecific to a murine amino terminal peptide sequence revealed a uniform distribution of GTA-kinase over the cytoplasm of acinar and duct cells of control mouse parotid glands. However, upon growth stimulation, kinase was detected primarily in a perinuclear and nuclear immunostaining pattern. Western blot analysis confirmed a translocation from a cytoplasmic localization in both LSTRA and quiescent salivary cells to a membrane-associated localization in YC-8 and proliferating salivary cells.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

PubMed Central

Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

2008-01-01

Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Comparative molecular cytogenetic analyses of a major tandemly repeated DNA family and retrotransposon sequences in cultivated jute Corchorus species (Malvaceae).

PubMed

Begum, Rabeya; Zakrzewski, Falk; Menzel, Gerhard; Weber, Beatrice; Alam, Sheikh Shamimul; Schmidt, Thomas

2013-07-01

The cultivated jute species Corchorus olitorius and Corchorus capsularis are important fibre crops. The analysis of repetitive DNA sequences, comprising a major part of plant genomes, has not been carried out in jute but is useful to investigate the long-range organization of chromosomes. The aim of this study was the identification of repetitive DNA sequences to facilitate comparative molecular and cytogenetic studies of two jute cultivars and to develop a fluorescent in situ hybridization (FISH) karyotype for chromosome identification. A plasmid library was generated from C. olitorius and C. capsularis with genomic restriction fragments of 100-500 bp, which was complemented by targeted cloning of satellite DNA by PCR. The diversity of the repetitive DNA families was analysed comparatively. The genomic abundance and chromosomal localization of different repeat classes were investigated by Southern analysis and FISH, respectively. The cytosine methylation of satellite arrays was studied by immunolabelling. Major satellite repeats and retrotransposons have been identified from C. olitorius and C. capsularis. The satellite family CoSat I forms two undermethylated species-specific subfamilies, while the long terminal repeat (LTR) retrotransposons CoRetro I and CoRetro II show similarity to the Metaviridea of plant retroelements. FISH karyotypes were developed by multicolour FISH using these repetitive DNA sequences in combination with 5S and 18S-5·8S-25S rRNA genes which enable the unequivocal chromosome discrimination in both jute species. The analysis of the structure and diversity of the repeated DNA is crucial for genome sequence annotation. The reference karyotypes will be useful for breeding of jute and provide the basis for karyotyping homeologous chromosomes of wild jute species to reveal the genetic and evolutionary relationship between cultivated and wild Corchorus species.
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

PubMed

Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

2015-01-01

Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Isolation and molecular characterization of dTnp1, a mobile and defective transposable element of Nicotiana plumbaginifolia.

PubMed

Meyer, C; Pouteau, S; Rouzé, P; Caboche, M

1994-01-01

By Northern blot analysis of nitrate reductase-deficient mutants of Nicotiana plumbaginifolia, we identified a mutant (mutant D65), obtained after gamma-ray irradiation of protoplasts, which contained an insertion sequence in the nitrate reductase (NR) mRNA. This insertion sequence was localized by polymerase chain reaction (PCR) in the first exon of NR and was also shown to be present in the NR gene. The mutant gene contained a 565 bp insertion sequence that exhibits the sequence characteristics of a transposable element, which was thus named dTnp1. The dTnp1 element has 14 bp terminal inverted repeats and is flanked by an 8-bp target site duplication generated upon transposition. These inverted repeats have significant sequence homology with those of other transposable elements. Judging by its size and the absence of a long open reading frame, dTnp1 appears to represent a defective, although mobile, transposable element. The octamer motif TTTAGGCC was found several times in direct orientation near the 5' and 3' ends of dTnp1 together with a perfect palindrome located after the 5' inverted repeat. Southern blot analysis using an internal probe of dTnp1 suggested that this element occurs as a single copy in the genome of N. plumbaginifolia. It is also present in N. tabacum, but absent in tomato or petunia. The dTnp1 element is therefore of potential use for gene tagging in Nicotiana species.
Noise-gating to Clean Astrophysical Image Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeForest, C. E.

I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to nomore » apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.« less
Noise-gating to Clean Astrophysical Image Data

NASA Astrophysics Data System (ADS)

DeForest, C. E.

2017-04-01

I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to no apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.
GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

PubMed Central

2014-01-01

Background Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing. Methods Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., ≤ 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair. Results Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original miRanda implementations through multiple test datasets. Conclusions We offer a GPU-based alternative to high performance compute (HPC) that can be developed locally at a relatively small cost. The community of GPU developers in the biomedical research community, particularly for genome analysis, is still growing. With increasing shared resources, this community will be able to advance CMTI in a very significant manner. Our source code is available at https://sourceforge.net/projects/cudamiranda/. PMID:25077821
Proteomic analysis of Toxocara canis excretory and secretory (TES) proteins.

PubMed

Sperotto, Rita Leal; Kremer, Frederico Schmitt; Aires Berne, Maria Elisabeth; Costa de Avila, Luciana F; da Silva Pinto, Luciano; Monteiro, Karina Mariante; Caumo, Karin Silva; Ferreira, Henrique Bunselmeyer; Berne, Natália; Borsuk, Sibele

2017-01-01

Toxocariasis is a neglected disease, and its main etiological agent is the nematode Toxocara canis. Serological diagnosis is performed by an enzyme-linked immunosorbent assay using T. canis excretory and secretory (TES) antigens produced by in vitro cultivation of larvae. Identification of TES proteins can be useful for the development of new diagnostic strategies since few TES components have been described so far. Herein, we report the results obtained by proteomic analysis of TES proteins using a liquid chromatography-tandem mass spectrometry (LC-MS/MS) approach. TES fractions were separated by one-dimensional SDS-PAGE and analyzed by LC-MS/MS. The MS/MS spectra were compared with a database of protein sequences deduced from the genome sequence of T. canis, and a total of 19 proteins were identified. Classification according to the signal peptide prediction using the SignalP server showed that seven of the identified proteins were extracellular, 10 had cytoplasmic or nuclear localization, while the subcellular localization of two proteins was unknown. Analysis of molecular functions by BLAST2GO showed that the majority of the gene ontology (GO) terms associated with the proteins present in the TES sample were associated with binding functions, including but not limited to protein binding (GO:0005515), inorganic ion binding (GO:0043167), and organic cyclic compound binding (GO:0097159). This study provides additional information about the exoproteome of T. canis, which can lead to the development of new strategies for diagnostics or vaccination. Copyright © 2016 Elsevier B.V. All rights reserved.
Characterization of a nuclear localization signal in the foot-and-mouth disease virus polymerase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sanchez-Aparicio, Maria Teresa; Rosas, Maria Flora; Sobrino, Francisco, E-mail: fsobrino@cbm.uam.es

2013-09-15

We have experimentally tested whether the MRKTKLAPT sequence in FMDV 3D protein (residues 16 to 24) can act as a nuclear localization signal (NLS). Mutants with substitutions in two basic residues within this sequence, K18E and K20E, were generated. A decreased nuclear localization was observed in transiently expressed 3D and its precursor 3CD, suggesting a role of K18 and K20 in nuclear targeting. Fusion of MRKTKLAPT to the green fluorescence protein (GFP) increased the nuclear localization of GFP, which was not observed when GFP was fused to the 3D mutated sequences. These results indicate that the sequence MRKTKLAPT can bemore » functionally considered as a NLS. When introduced in a FMDV full length RNA replacements K18E and K20E led to production of revertant viruses that replaced the acidic residues introduced (E) by K, suggesting that the presence of lysins at positions 18 and 20 of 3D is essential for virus multiplication. - Highlights: • The FMDV 3D polymerase contains a nuclear localization signal. • Replacements K18E and K20E decrease nuclear localization of 3D and its precursor 3CD. • Fusion of the MRKTKLAPT 3D motif to GFP increases the nuclear localization of GFP. • Replacements K18E and K20E abolish the ability of MRKTKLAPT to relocate GFP. • RNAs harboring replacements K18E and K20E lead to recovery of revertant FMDVs.« less
Musculoskeletal MRI findings of juvenile localized scleroderma.

PubMed

Eutsler, Eric P; Horton, Daniel B; Epelman, Monica; Finkel, Terri; Averill, Lauren W

2017-04-01

Juvenile localized scleroderma comprises a group of autoimmune conditions often characterized clinically by an area of skin hardening. In addition to superficial changes in the skin and subcutaneous tissues, juvenile localized scleroderma may involve the deep soft tissues, bones and joints, possibly resulting in functional impairment and pain in addition to cosmetic changes. There is literature documenting the spectrum of findings for deep involvement of localized scleroderma (fascia, muscles, tendons, bones and joints) in adults, but there is limited literature for the condition in children. We aimed to document the spectrum of musculoskeletal magnetic resonance imaging (MRI) findings of both superficial and deep juvenile localized scleroderma involvement in children and to evaluate the utility of various MRI sequences for detecting those findings. Two radiologists retrospectively evaluated 20 MRI studies of the extremities in 14 children with juvenile localized scleroderma. Each imaging sequence was also given a subjective score of 0 (not useful), 1 (somewhat useful) or 2 (most useful for detecting the findings). Deep tissue involvement was detected in 65% of the imaged extremities. Fascial thickening and enhancement were seen in 50% of imaged extremities. Axial T1, axial T1 fat-suppressed (FS) contrast-enhanced and axial fluid-sensitive sequences were rated most useful. Fascial thickening and enhancement were the most commonly encountered deep tissue findings in extremity MRIs of children with juvenile localized scleroderma. Because abnormalities of the skin, subcutaneous tissues and fascia tend to run longitudinally in an affected limb, axial T1, axial fluid-sensitive and axial T1-FS contrast-enhanced sequences should be included in the imaging protocol.
A computational genomics pipeline for prokaryotic sequencing projects.

PubMed

Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

2010-08-01

New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

Nucleotide sequences of Herpes Simplex Virus type 1 (HSV-1) affecting virus entry, cell fusion, and production of glycoprotein gB (VP7)

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeLuca, N.; Bzik, D.J.; Bond, V.C.

1982-10-30

The tsB5 strain of Herpes Simplex Virus type 1 (HSV-1) contains at least two mutations; one mutation specifies the syncytial phenotype and the other confers temperature sensitivity for virus growth. These functions are known to be located between the prototypic map coordinates 0.30 and 0.42. In this study it was demonstrated that tsB5 enters human embryonic lung (HEL) cells more rapidly than KOS, another strain of HSV-1. The EcoRI restriction fragment F from the KOS strain (map coordinates 0.315 to 0.421) was mapped with eight restriction endonucleases, and 16 recombinant plasmids were constructed which contained varying portions of the KOSmore » genome. Recombinant viruses were generated by marker-rescue and marker-transfer cotransfection procedures, using intact DNA from one strain and a recombinant plasmid containing DNA from the other strain. The region of the crossover between the two nonisogenic strains was inferred by the identification of restriction sites in the recombinants that were characteristic of the parental strains. The recombinants were subjected to phenotypic analysis. Syncytium formation, rate of virus entry, and the production of gB were all separable by the crossovers that produced the recombinants. The KOS sequences which rescue the syncytial phenotype of tsB5 were localized to 1.5 kb (map coordinates 0.345 to 0.355), and the temperature-sensitive mutation was localized to 1.2 kb (0.360 to 0.368), giving an average separation between the mutations of 2.5 kb on the 150-kb genome. DNA sequences that specify a functional domain for virus entry were localized to the nucleotide sequences between the two mutations. All three functions could be encoded by the virus gene specifying the gB glycoprotein.« less
Quantitative trait nucleotide analysis using Bayesian model selection.

PubMed

Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

2005-10-01

Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.
On the Sequence-Directed Nature of Human Gene Mutation: The Role of Genomic Architecture and the Local DNA Sequence Environment in Mediating Gene Mutations Underlying Human Inherited Disease

PubMed Central

Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min

2011-01-01

Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
[Identification and polymorphism of pectinase genes PGU in the Saccharomyces bayanus complex].

PubMed

Shalamitskiy, M Yu; Naumov, G I

2016-05-01

Pectinase (endo-polygalacturonase) is the key enzyme splitting plant pectin. The corresponding single gene PGU1 is documented for the yeast S. cerevisiae. On the basis of phylogenetic analysis of the PGU nucleotide sequence available in the GenBank, a family of divergent PGU genes is found in the species complex S. bayanus: S. bayanus var. uvarum, S. eubayanus, and hybrid taxon S. pastorianus. The PGU genes have different chromosome localization.
shRNA target prediction informed by comprehensive enquiry (SPICE): a supporting system for high-throughput screening of shRNA library.

PubMed

Kamatuka, Kenta; Hattori, Masahiro; Sugiyama, Tomoyasu

2016-12-01

RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

PubMed Central

Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

2007-01-01

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
Task planning with uncertainty for robotic systems. Thesis

NASA Technical Reports Server (NTRS)

Cao, Tiehua

1993-01-01

In a practical robotic system, it is important to represent and plan sequences of operations and to be able to choose an efficient sequence from them for a specific task. During the generation and execution of task plans, different kinds of uncertainty may occur and erroneous states need to be handled to ensure the efficiency and reliability of the system. An approach to task representation, planning, and error recovery for robotic systems is demonstrated. Our approach to task planning is based on an AND/OR net representation, which is then mapped to a Petri net representation of all feasible geometric states and associated feasibility criteria for net transitions. Task decomposition of robotic assembly plans based on this representation is performed on the Petri net for robotic assembly tasks, and the inheritance of properties of liveness, safeness, and reversibility at all levels of decomposition are explored. This approach provides a framework for robust execution of tasks through the properties of traceability and viability. Uncertainty in robotic systems are modeled by local fuzzy variables, fuzzy marking variables, and global fuzzy variables which are incorporated in fuzzy Petri nets. Analysis of properties and reasoning about uncertainty are investigated using fuzzy reasoning structures built into the net. Two applications of fuzzy Petri nets, robot task sequence planning and sensor-based error recovery, are explored. In the first application, the search space for feasible and complete task sequences with correct precedence relationships is reduced via the use of global fuzzy variables in reasoning about subgoals. In the second application, sensory verification operations are modeled by mutually exclusive transitions to reason about local and global fuzzy variables on-line and automatically select a retry or an alternative error recovery sequence when errors occur. Task sequencing and task execution with error recovery capability for one and multiple soft components in robotic systems are investigated.
Subcellular localization of full-length human myeloid leukemia factor 1 (MLF1) is independent of 14-3-3 proteins.

PubMed

Molzan, Manuela; Ottmann, Christian

2013-03-01

Myeloid leukemia factor 1 (MLF1) is associated with the development of leukemic diseases such as acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). However, information on the physiological function of MLF1 is limited and mostly derived from studies identifying MLF1 interaction partners like CSN3, MLF1IP, MADM, Manp and the 14-3-3 proteins. The 14-3-3-binding site surrounding S34 is one of the only known functional features of the MLF1 sequence, along with one nuclear export sequence (NES) and two nuclear localization sequences (NLS). It was recently shown that the subcellular localization of mouse MLF1 is dependent on 14-3-3 proteins. Based on these findings, we investigated whether the subcellular localization of human MLF1 was also directly 14-3-3-dependent. Live cell imaging with GFP-fused human MLF1 was used to study the effects of mutations and deletions on its subcellular localization. Surprisingly, we found that the subcellular localization of full-length human MLF1 is 14-3-3-independent, and is probably regulated by other as-yet-unknown proteins.
Exact Identification of a Quantum Change Point

NASA Astrophysics Data System (ADS)

Sentís, Gael; Calsamiglia, John; Muñoz-Tapia, Ramon

2017-10-01

The detection of change points is a pivotal task in statistical analysis. In the quantum realm, it is a new primitive where one aims at identifying the point where a source that supposedly prepares a sequence of particles in identical quantum states starts preparing a mutated one. We obtain the optimal procedure to identify the change point with certainty—naturally at the price of having a certain probability of getting an inconclusive answer. We obtain the analytical form of the optimal probability of successful identification for any length of the particle sequence. We show that the conditional success probabilities of identifying each possible change point show an unexpected oscillatory behavior. We also discuss local (online) protocols and compare them with the optimal procedure.
Exact Identification of a Quantum Change Point.

PubMed

Sentís, Gael; Calsamiglia, John; Muñoz-Tapia, Ramon

2017-10-06

The detection of change points is a pivotal task in statistical analysis. In the quantum realm, it is a new primitive where one aims at identifying the point where a source that supposedly prepares a sequence of particles in identical quantum states starts preparing a mutated one. We obtain the optimal procedure to identify the change point with certainty-naturally at the price of having a certain probability of getting an inconclusive answer. We obtain the analytical form of the optimal probability of successful identification for any length of the particle sequence. We show that the conditional success probabilities of identifying each possible change point show an unexpected oscillatory behavior. We also discuss local (online) protocols and compare them with the optimal procedure.
Ecology and evolution of rabies virus in Europe.

PubMed

Bourhy, H; Kissi, B; Audry, L; Smreczak, M; Sadkowska-Todys, M; Kulonen, K; Tordo, N; Zmudzinski, J F; Holmes, E C

1999-10-01

The evolution of rabies viruses of predominantly European origin was studied by comparing nucleotide sequences of the nucleoprotein and glycoprotein genes, and by typing isolates using RFLP. Phylogenetic analysis of the gene sequence data revealed a number of distinct groups, each associated with a particular geographical area. Such a pattern suggests that rabies virus has spread westwards and southwards across Europe during this century, but that physical barriers such as the Vistula river in Poland have enabled localized evolution. During this dispersal process, two species jumps took place - one into red foxes and another into raccoon dogs, although it is unclear whether virus strains are preferentially adapted to particular animal species or whether ecological forces explain the occurrence of the phylogenetic groups.
Symbiotic Bacteria Associated with Stomach Discs of Human Lice▿ †

PubMed Central

Sasaki-Fukatsu, Kayoko ; Koga, Ryuichi; Nikoh, Naruo; Yoshizawa, Kazunori; Kasai, Shinji; Mihara, Minoru; Kobayashi, Mutsuo; Tomita, Takashi; Fukatsu, Takema

2006-01-01

The symbiotic bacteria associated with the stomach disc, a large aggregate of bacteriocytes on the ventral side of the midgut, of human body and head lice were characterized. Molecular phylogenetic analysis of 16S rRNA gene sequences showed that the symbionts formed a distinct and well-defined clade in the Gammaproteobacteria. The sequences exhibited AT-biased nucleotide composition and accelerated molecular evolution. In situ hybridization revealed that in nymphs and adult males, the symbiont was localized in the stomach disc, while in adult females, the symbiont was not in the stomach disc but in the lateral oviducts and the posterior pole of the oocytes due to female-specific symbiont migration. We propose the designation “Candidatus Riesia pediculicola” for the louse symbionts. PMID:16950915
A rice chloroplast transit peptide sequence does not alter the cytoplasmic localization of sheep serotonin N-acetyltransferase expressed in transgenic rice plants.

PubMed

Byeon, Yeong; Lee, Hyoung Yool; Lee, Kyungjin; Back, Kyoungwhan

2014-09-01

Ectopic overexpression of melatonin biosynthetic genes of animal origin has been used to generate melatonin-rich transgenic plants to examine the functional roles of melatonin in plants. However, the subcellular localization of these proteins expressed in the transgenic plants remains unknown. We studied the localization of sheep (Ovis aries) serotonin N-acetyltransferase (OaSNAT) and a translational fusion of a rice SNAT transit peptide to OaSNAT (TS:OaSNAT) in plants. Laser confocal microscopy analysis revealed that both OaSNAT and TS:OaSNAT proteins were localized to the cytoplasm even with the addition of the transit sequence to OaSNAT. Transgenic rice plants overexpressing the TS:OaSNAT fusion transgene exhibited high SNAT enzyme activity relative to untransformed wild-type plants, but lower activity than transgenic rice plants expressing the wild-type OaSNAT gene. Melatonin levels in both types of transgenic rice plant corresponded well with SNAT enzyme activity levels. The TS:OaSNAT transgenic lines exhibited increased seminal root growth relative to wild-type plants, but less than in the OaSNAT transgenic lines, confirming that melatonin promotes root growth. Seed-specific OaSNAT expression under the control of a rice prolamin promoter did not confer high levels of melatonin production in transgenic rice seeds compared with seeds from transgenic plants expressing OaSNAT under the control of the constitutive maize ubiquitin promoter. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Interstellar medium conditions in z 0.2 Lyman-break analogs

NASA Astrophysics Data System (ADS)

Contursi, A.; Baker, A. J.; Berta, S.; Magnelli, B.; Lutz, D.; Fischer, J.; Verma, A.; Nielbock, M.; Grácia Carpio, J.; Veilleux, S.; Sturm, E.; Davies, R.; Genzel, R.; Hailey-Dunsheath, S.; Herrera-Camus, R.; Janssen, A.; Poglitsch, A.; Sternberg, A.; Tacconi, L. J.

2017-10-01

We present an analysis of far-infrared (FIR) [CII] and [OI] fine structure line and continuum observations obtained with Herschel/PACS, and 12CO(1-0) observations obtained with the IRAM Plateau de Bure Interferometer, of Lyman-break analogs (LBAs) at z 0.2. The principal aim of this work is to determine the typical interstellar medium (ISM) properties of z 1-2 main sequence (MS) galaxies, with stellar masses between 109.5 and 1011M⊙, which are currently not easily detectable in all these lines even with ALMA and NOEMA. We perform PDR modeling and apply different infared diagnostics to derive the main physical parameters of the far-infrared (FIR)-emitting gas and dust and we compare the derived ISM properties to those of galaxies on and above the MS at different redshifts. We find that the ISM properties of LBAs are quite extreme (low gas temperature and high density and thermal pressure) with respect to those found in local normal spirals and more active local galaxies. LBAs have no [CII] deficit despite having the high specific star formation rates (sSFRs) typical of starbursts. Although LBAs lie above the local MS, we show that their ISM properties are more similar to those of high-redshift MS galaxies than of local galaxies above the main sequence. This data set represents an important reference for planning future ALMA [CII] observations of relatively low-mass MS galaxies at the epoch of the peak of the cosmic star formation.
RNA sequencing reveals pronounced changes in the noncoding transcriptome of aging synaptosomes.

PubMed

Chen, Bei Jun; Ueberham, Uwe; Mills, James D; Kirazov, Ludmil; Kirazov, Evgeni; Knobloch, Mara; Bochmann, Jana; Jendrek, Renate; Takenaka, Konii; Bliim, Nicola; Arendt, Thomas; Janitz, Michael

2017-08-01

Normal aging is associated with impairments in cognitive functions. These alterations are caused by diminutive changes in the biology of synapses, and ineffective neurotransmission, rather than loss of neurons. Hitherto, only a few studies, exploring molecular mechanisms of healthy brain aging in higher vertebrates, utilized synaptosomal fractions to survey local changes in aging-related transcriptome dynamics. Here we present, for the first time, a comparative analysis of the synaptosomes transcriptome in the aging mouse brain using RNA sequencing. Our results show changes in the expression of genes contributing to biological pathways related to neurite guidance, synaptosomal physiology, and RNA splicing. More intriguingly, we also discovered alterations in the expression of thousands of novel, unannotated lincRNAs during aging. Further, detailed characterization of the cleavage and polyadenylation factor I subunit 1 (Clp1) mRNA and protein expression indicates its increased expression in neuronal processes of hippocampal stratum radiatum in aging mice. Together, our study uncovers a new layer of transcriptional regulation which is targeted by aging within the local environment of interconnecting neuronal cells. Copyright © 2017 Elsevier Inc. All rights reserved.
Analysis of Sir2E in the cellular slime mold Dictyostelium discoideum: cellular localization, spatial expression and overexpression.

PubMed

Katayama, Takahiro; Yasukawa, Hiro

2008-10-01

It has been reported that Dictyostelium discoideum encodes four silent information regulator 2 (Sir2) proteins (Sir2A-D) showing sequence similarity to human homologues of Sir2 (SIRT1-3). Further screening in a database revealed that D. discoideum encodes an additional Sir2 homologue (Sir2E). The amino acid sequence of Sir2E is not similar to those of SIRTs but is similar to those of proteins encoded by Giardia lamblia, Cryptosporidium hominis and Cryptosporidium parvum. Fluorescence of Sir2E-green fluorescent protein fusion protein was detected in the D. discoideum nucleus, indicating that Sir2E is a nuclear localizing protein. Reverse transcription-polymerase chain reaction and whole-mount in situ hybridization analyses showed that D. discoideum expressed sir2E in amoebae in the growth phase and in prestalk cells in the developmental phase. D. discoideum overexpressing sir2E grew faster than the wild type. These results indicate that Sir2E plays important roles both in the growth phase and developmental phase of D. discoideum.
Sequence conservation and antibody cross-recognition of clade B human immunodeficiency virus (HIV) type 1 Tat protein in HIV-1-infected Italians, Ugandans, and South Africans.

PubMed

Buttò, Stefano; Fiorelli, Valeria; Tripiciano, Antonella; Ruiz-Alvarez, Maria J; Scoglio, Arianna; Ensoli, Fabrizio; Ciccozzi, Massimo; Collacchi, Barbara; Sabbatucci, Michela; Cafaro, Aurelio; Guzmán, Carlos A; Borsetti, Alessandra; Caputo, Antonella; Vardas, Eftyhia; Colvin, Mark; Lukwiya, Matthew; Rezza, Giovanni; Ensoli, Barbara

2003-10-15

We determined immune cross-recognition and the degree of Tat conservation in patients infected by local human immunodeficiency virus (HIV) type 1 strains. The data indicated a similar prevalence of total and epitope-specific anti-Tat IgG in 578 serum samples from HIV-infected Italian (n=302), Ugandan (n=139), and South African (n=137) subjects, using the same B clade Tat protein that is being used in vaccine trials. In particular, anti-Tat antibodies were detected in 13.2%, 10.8%, and 13.9% of HIV-1-infected individuals from Italy, Uganda, and South Africa, respectively. Sequence analysis results indicated a high similarity of Tat from the different circulating viruses with BH-10 Tat, particularly in the 1-58 amino acid region, which contains most of the immunogenic epitopes. These data indicate an effective cross-recognition of a B-clade laboratory strain-derived Tat protein vaccine by individuals infected with different local viruses, owing to the high similarity of Tat epitopes.
Galactic helium-to-metals enrichment ratio (Delta Y/ Delta Z) from the analysis of local main sequence stars observed by HIPPARCOS

NASA Astrophysics Data System (ADS)

Gennaro, M.; Prada Moroni, P. G.; Degl'Innocenti, S.

We discuss the reliability of one of the most used method to determine the Helium-to-metals enrichment ratio, Delta Y / Delta Z, i.e. the photometric comparison of a selected data set of local disk low Main Sequence (MS) stars observed by HIPPARCOS and a new grid of stellar models with up-to-date input physics. Most of the attention has been devoted to evaluate the effects on the final results of different sources of uncertainty (observational errors, evolutionary effects, selection criteria, systematic uncertainties of the models, numerical errors). As a check of the result the procedure has been repeated using another, independent, data set: the low-MS of the Hyades cluster. The obtained of Delta Y/ Delta Z for the Hyades, together with spectroscopic determinations of [Fe/H] ratio, have been used to obtain the Y and Z values for the cluster. Isochrones have been calculated with the estimated chemical composition, obtaining a very good agreement between the predicted position of the Hyades MS and the observational data in the Color - Magnitude Diagram (CMD).
Automatic segmentation of 4D cardiac MR images for extraction of ventricular chambers using a spatio-temporal approach

NASA Astrophysics Data System (ADS)

Atehortúa, Angélica; Zuluaga, Maria A.; Ourselin, Sébastien; Giraldo, Diana; Romero, Eduardo

2016-03-01

An accurate ventricular function quantification is important to support evaluation, diagnosis and prognosis of several cardiac pathologies. However, expert heart delineation, specifically for the right ventricle, is a time consuming task with high inter-and-intra observer variability. A fully automatic 3D+time heart segmentation framework is herein proposed for short-axis-cardiac MRI sequences. This approach estimates the heart using exclusively information from the sequence itself without tuning any parameters. The proposed framework uses a coarse-to-fine approach, which starts by localizing the heart via spatio-temporal analysis, followed by a segmentation of the basal heart that is then propagated to the apex by using a non-rigid-registration strategy. The obtained volume is then refined by estimating the ventricular muscle by locally searching a prior endocardium- pericardium intensity pattern. The proposed framework was applied to 48 patients datasets supplied by the organizers of the MICCAI 2012 Right Ventricle segmentation challenge. Results show the robustness, efficiency and competitiveness of the proposed method both in terms of accuracy and computational load.

Comparative Genome Analysis Between Aspergillus oryzae Strains Reveals Close Relationship Between Sites of Mutation Localization and Regions of Highly Divergent Genes among Aspergillus Species

PubMed Central

Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.; Machida, Masayuki; Hirano, Takashi

2012-01-01

Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome. PMID:22912434
Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species.

PubMed

Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E; Machida, Masayuki; Hirano, Takashi

2012-10-01

Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.
Detection and phylogenetic analysis of a new adenoviral polymerase gene in reptiles in Korea.

PubMed

Bak, Eun-Jung; Jho, Yeonsook; Woo, Gye-Hyeong

2018-06-01

Over a period of 7 years (2004-2011), samples from 34 diseased reptiles provided by local governments, zoos, and pet shops were tested for viral infection. Animals were diagnosed based on clinical signs, including loss of appetite, diarrhea, rhinorrhea, and unexpected sudden death. Most of the exotic animals had gastrointestinal problems, such as mucosal redness and ulcers, while the native animals had no clinical symptoms. Viral sequences were found in seven animals. Retroviral genes were amplified from samples from five Burmese pythons (Python molurus bivittatus), an adenovirus was detected in a panther chameleon (Furcifer pardalis), and an adenovirus and a paramyxovirus were detected in a tropical girdled lizard (Cordylus tropidosternum). Phylogenetic analysis of retroviruses and paramyxoviruses showed the highest sequence identity to both a Python molurus endogenous retrovirus and a Python curtus endogenous retrovirus and to a lizard isolate, respectively. Partial sequencing of an adenoviral DNA polymerase gene from the lizard isolate suggested that the corresponding virus was a novel isolate different from the reference strain (accession no. AY576677.1). The virus was not isolated but was detected, using molecular genetic techniques, in a lizard raised in a pet shop. This animal was also coinfected with a paramyxovirus.
Mutation analysis of the chromosome 14q24.3 dihydrolipoyl succinyltransferase (DLST) gene in patients with early-onset Alzheimer disease.

PubMed

Cruts, M; Backhovens, H; Van Gassen, G; Theuns, J; Wang, S Y; Wehnert, A; van Duijn, C M; Karlsson, T; Hofman, A; Adolfsson, R

1995-10-13

Linkage analysis studies have indicated that the chromosome band 14q24.3 harbours a major gene for familial early-onset Alzheimer's disease (AD). Recently we localized the chromosome 14 AD gene (AD3) in the 6.4 cM interval between the markers D14S289 and D14S61. We mapped the gene encoding dihydrolipoyl succinyltransferase (DLST), the E2k component of human alpha-ketoglutarate dehydrogenase complex (KGDHC), in the AD3 candidate region using yeast artificial chromosomes (YACs). The DLST gene is a candidate for the AD3 gene since deficiencies in KGDHC activity have been observed in brain tissue and fibroblasts of AD patients. The 15 exons and the promoter region of the DLST gene were analysed for mutations in chromosome 14 linked AD cases and in two series of unrelated early-onset AD cases (onset age < 55 years). Sequence variations in intronic sequences (introns 3, 5 and 10) or silent mutations in exonic sequences (exons 8 and 14) were identified. However, no AD related mutations were observed, suggesting that the DLST gene is not the chromosome 14 AD3 gene.
SONAR: A High-Throughput Pipeline for Inferring Antibody Ontogenies from Longitudinal Sequencing of B Cell Transcripts

PubMed Central

Schramm, Chaim A.; Sheng, Zizhang; Zhang, Zhenhai; Mascola, John R.; Kwong, Peter D.; Shapiro, Lawrence

2016-01-01

The rapid advance of massively parallel or next-generation sequencing technologies has made possible the characterization of B cell receptor repertoires in ever greater detail, and these developments have triggered a proliferation of software tools for processing and annotating these data. Of especial interest, however, is the capability to track the development of specific antibody lineages across time, which remains beyond the scope of most current programs. We have previously reported on the use of techniques such as inter- and intradonor analysis and CDR3 tracing to identify transcripts related to an antibody of interest. Here, we present Software for the Ontogenic aNalysis of Antibody Repertoires (SONAR), capable of automating both general repertoire analysis and specialized techniques for investigating specific lineages. SONAR annotates next-generation sequencing data, identifies transcripts in a lineage of interest, and tracks lineage development across multiple time points. SONAR also generates figures, such as identity–divergence plots and longitudinal phylogenetic “birthday” trees, and provides interfaces to other programs such as DNAML and BEAST. SONAR can be downloaded as a ready-to-run Docker image or manually installed on a local machine. In the latter case, it can also be configured to take advantage of a high-performance computing cluster for the most computationally intensive steps, if available. In summary, this software provides a useful new tool for the processing of large next-generation sequencing datasets and the ontogenic analysis of neutralizing antibody lineages. SONAR can be found at https://github.com/scharch/SONAR, and the Docker image can be obtained from https://hub.docker.com/r/scharch/sonar/. PMID:27708645
Impact of Sampling Density on the Extent of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2014-01-01

Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Global stratigraphy of Venus: Analysis of a random sample of thirty-six test areas

NASA Technical Reports Server (NTRS)

Basilevsky, Alexander T.; Head, James W., III

1995-01-01

The age relations between 36 impact craters with dark paraboloids and other geologic units and structures at these localities have been studied through photogeologic analysis of Magellan SAR images of the surface of Venus. Geologic settings in all 36 sites, about 1000 x 1000 km each, could be characterized using only 10 different terrain units and six types of structures. Mapping of such units and structures in 36 randomly distributed large regions shows evidence for a distinctive regional and global stratigraphic and geologic sequence. On the basis of this sequence we have developed a model that illustrates several major themes in the history of Venus. Most of the history of Venus (that of its first 80% or so) is not preserved in the surface geomorphological record. The major deformation associated with tessera formation in the period sometime between 0.5-1.0 b.y. ago (Ivanov and Basilevsky, 1993) is the earliest event detected. Our stratigraphic analyses suggest that following tessera formation, extensive volcanic flooding resurfaced at least 85% of the planet in the form of the presently-ridged and fractured plains. Several lines of evidence favor a high flux in the post-tessera period but we have no independent evidence for the absolute duration of ridged plains emplacement. During this time, the net state of stress in the lithosphere apparently changed from extensional to compressional, first in the form of extensive ridge belt development, followed by the formation of extensive wrinkle ridges on the flow units. Subsequently, there occurred local emplacement of smooth and lobate plains units which are presently essentially undeformed. The major events in the latest 10% of the presently preserved history of Venus are continued rifting and some associated volcanism, and the redistribution of eolian material largely derived from impact crater deposits. Detailed geologic mapping and stratigraphic synthesis are necessary to test this sequence and to address many of the outstanding problems raised by this analysis.
cDNA cloning of the human monocarboxylate transporter 1 and chromosomal localization of the SLC16A1 locus to 1p13.2-p12

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia, C.K.; Li, X.; Luna, J.

1994-09-15

Lactate and pyruvate are transported across cell membranes by monocarboxylate transporters (MCTs). Here, the authors use the recently cloned cDNA for hamster MCT1 to isolate cDNA and genomic clones for human MCT1. Comparison of the human and hamster amino acid sequences revealed that the proteins are 86% identical. The gene for human MCT1 (gene symbol, SLC16A1) was localized to human chromosome bands 1p13.2-p12 by PCR analysis of panels of human X rodent cell hybrid lines and by fluorescence chromosomal in situ hybridization. 9 refs., 2 figs.
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.

PubMed

Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt

2008-07-01

MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
The membrane skeleton in Paramecium: Molecular characterization of a novel epiplasmin family and preliminary GFP expression results.

PubMed

Pomel, Sébastien; Diogon, Marie; Bouchard, Philippe; Pradel, Lydie; Ravet, Viviane; Coffe, Gérard; Viguès, Bernard

2006-02-01

Previous attempts to identify the membrane skeleton of Paramecium cells have revealed a protein pattern that is both complex and specific. The most prominent structural elements, epiplasmic scales, are centered around ciliary units and are closely apposed to the cytoplasmic side of the inner alveolar membrane. We sought to characterize epiplasmic scale proteins (epiplasmins) at the molecular level. PCR approaches enabled the cloning and sequencing of two closely related genes by amplifications of sequences from a macronuclear genomic library. Using these two genes (EPI-1 and EPI-2), we have contributed to the annotation of the Paramecium tetraurelia macronuclear genome and identified 39 additional (paralogous) sequences. Two orthologous sequences were found in the Tetrahymena thermophila genome. Structural analysis of the 43 sequences indicates that the hallmark of this new multigenic family is a 79 aa domain flanked by two Q-, P- and V-rich stretches of sequence that are much more variable in amino-acid composition. Such features clearly distinguish members of the multigenic family from epiplasmic proteins previously sequenced in other ciliates. The expression of Green Fluorescent Protein (GFP)-tagged epiplasmin showed significant labeling of epiplasmic scales as well as oral structures. We expect that the GFP construct described herein will prove to be a useful tool for comparative subcellular localization of different putative epiplasmins in Paramecium.
Purification, characterization and sequence analysis of Omp50,a new porin isolated from Campylobacter jejuni.

PubMed Central

Bolla, J M; Dé, E; Dorez, A; Pagès, J M

2000-01-01

A novel pore-forming protein identified in Campylobacter was purified by ion-exchange chromatography and named Omp50 according to both its molecular mass and its outer membrane localization. We observed a pore-forming ability of Omp50 after re-incorporation into artificial membranes. The protein induced cation-selective channels with major conductance values of 50-60 pS in 1 M NaCl. N-terminal sequencing allowed us to identify the predicted coding sequence Cj1170c from the Campylobacter jejuni genome database as the corresponding gene in the NCTC 11168 genome sequence. The gene, designated omp50, consists of a 1425 bp open reading frame encoding a deduced 453-amino acid protein with a calculated pI of 5.81 and a molecular mass of 51169.2 Da. The protein possessed a 20-amino acid leader sequence. No significant similarity was found between Omp50 and porin protein sequences already determined. Moreover, the protein showed only weak sequence identity with the major outer-membrane protein (MOMP) of Campylobacter, correlating with the absence of antigenic cross-reactivity between these two proteins. Omp50 is expressed in C. jejuni and Campylobacter lari but not in Campylobacter coli. The gene, however, was detected in all three species by PCR. According to its conformation and functional properties, the protein would belong to the family of outer-membrane monomeric porins. PMID:11104668
OVAS: an open-source variant analysis suite with inheritance modelling.

PubMed

Mozere, Monika; Tekman, Mehmet; Kari, Jameela; Bockenhauer, Detlef; Kleta, Robert; Stanescu, Horia

2018-02-08

The advent of modern high-throughput genetics continually broadens the gap between the rising volume of sequencing data, and the tools required to process them. The need to pinpoint a small subset of functionally important variants has now shifted towards identifying the critical differences between normal variants and disease-causing ones. The ever-increasing reliance on cloud-based services for sequence analysis and the non-transparent methods they utilize has prompted the need for more in-situ services that can provide a safer and more accessible environment to process patient data, especially in circumstances where continuous internet usage is limited. To address these issues, we herein propose our standalone Open-source Variant Analysis Sequencing (OVAS) pipeline; consisting of three key stages of processing that pertain to the separate modes of annotation, filtering, and interpretation. Core annotation performs variant-mapping to gene-isoforms at the exon/intron level, append functional data pertaining the type of variant mutation, and determine hetero/homozygosity. An extensive inheritance-modelling module in conjunction with 11 other filtering components can be used in sequence ranging from single quality control to multi-file penetrance model specifics such as X-linked recessive or mosaicism. Depending on the type of interpretation required, additional annotation is performed to identify organ specificity through gene expression and protein domains. In the course of this paper we analysed an autosomal recessive case study. OVAS made effective use of the filtering modules to recapitulate the results of the study by identifying the prescribed compound-heterozygous disease pattern from exome-capture sequence input samples. OVAS is an offline open-source modular-driven analysis environment designed to annotate and extract useful variants from Variant Call Format (VCF) files, and process them under an inheritance context through a top-down filtering schema of swappable modules, run entirely off a live bootable medium and accessed locally through a web-browser.
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

PubMed

Nishizawa, M; Nishizawa, K

2000-10-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

PubMed Central

Nishizawa, Manami; Nishizawa, Kazuhisa

2000-01-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273
Genetic divergence between freshwater and marine morphs of alewife (Alosa pseudoharengus): a 'next-generation' sequencing analysis.

PubMed

Czesny, Sergiusz; Epifanio, John; Michalak, Pawel

2012-01-01

Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes.
Genetic Divergence between Freshwater and Marine Morphs of Alewife (Alosa pseudoharengus): A ‘Next-Generation’ Sequencing Analysis

PubMed Central

Czesny, Sergiusz; Epifanio, John; Michalak, Pawel

2012-01-01

Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes. PMID:22438868
PlantTFDB: a comprehensive plant transcription factor database

PubMed Central

Guo, An-Yuan; Chen, Xin; Gao, Ge; Zhang, He; Zhu, Qi-Hui; Liu, Xiao-Chuan; Zhong, Ying-Fu; Gu, Xiaocheng; He, Kun; Luo, Jingchu

2008-01-01

Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26 402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis. PMID:17933783
Genotyping of Echinococcus granulosus from domestic animals and humans from Ardabil Province, northwest Iran.

PubMed

Pezeshki, A; Akhlaghi, L; Sharbatkhori, M; Razmjou, E; Oormazdi, H; Mohebali, M; Meamar, A R

2013-12-01

Cystic echinococcosis is endemic in Iran, particularly in Ardabil Province, where it causes health and economic problems. The genetic pattern of Echinococcus granulosus has been determined in most parts of Iran, except in this area. In the present investigation, 55 larval isolates were collected from humans (11), sheep (19), goats (4) and cattle (21). For analysis of the genetic characteristics of E. granulosus isolates, DNA sequencing of mitochondrial cytochrome c oxidase subunit 1 (cox1) and NADH dehydrogenase subunit 1 (nad1) genes was applied. Fifty isolates were successfully analysed, with 92% (46) and 8% (4) identified as G1 and G3 genotypes, respectively. The sequence analyses of the isolates displayed nine characteristic profiles in cox1 sequences and eight characteristic profiles in nad1 sequences. Based on these results, the sheep strain (G1 genotype) was the most prevalent in humans, sheep, goats and cattle. The buffalo strain (G3 genotype) was not only demonstrated in sheep (1 isolate) and cattle (1 isolate), but also for the first time in two human isolates. These findings will provide information for local control of echinococcosis.
A microarray-based genotyping and genetic mapping approach for highly heterozygous outcrossing species enables localization of a large fraction of the unassembled Populus trichocarpa genome sequence.

PubMed

Drost, Derek R; Novaes, Evandro; Boaventura-Novaes, Carolina; Benedict, Catherine I; Brown, Ryan S; Yin, Tongming; Tuskan, Gerald A; Kirst, Matias

2009-06-01

Microarrays have demonstrated significant power for genome-wide analyses of gene expression, and recently have also revolutionized the genetic analysis of segregating populations by genotyping thousands of loci in a single assay. Although microarray-based genotyping approaches have been successfully applied in yeast and several inbred plant species, their power has not been proven in an outcrossing species with extensive genetic diversity. Here we have developed methods for high-throughput microarray-based genotyping in such species using a pseudo-backcross progeny of 154 individuals of Populus trichocarpa and P. deltoides analyzed with long-oligonucleotide in situ-synthesized microarray probes. Our analysis resulted in high-confidence genotypes for 719 single-feature polymorphism (SFP) and 1014 gene expression marker (GEM) candidates. Using these genotypes and an established microsatellite (SSR) framework map, we produced a high-density genetic map comprising over 600 SFPs, GEMs and SSRs. The abundance of gene-based markers allowed us to localize over 35 million base pairs of previously unplaced whole-genome shotgun (WGS) scaffold sequence to putative locations in the genome of P. trichocarpa. A high proportion of sampled scaffolds could be verified for their placement with independently mapped SSRs, demonstrating the previously un-utilized power that high-density genotyping can provide in the context of map-based WGS sequence reassembly. Our results provide a substantial contribution to the continued improvement of the Populus genome assembly, while demonstrating the feasibility of microarray-based genotyping in a highly heterozygous population. The strategies presented are applicable to genetic mapping efforts in all plant species with similarly high levels of genetic diversity.
A comprehensive analysis of 3′ end sequencing data sets reveals novel polyadenylation signals and the repressive role of heterogeneous ribonucleoprotein C on cleavage and polyadenylation

PubMed Central

Gruber, Andreas J.; Schmidt, Ralf; Gruber, Andreas R.; Martin, Georges; Ghosh, Souvik; Belmadani, Manuel; Keller, Walter

2016-01-01

Alternative polyadenylation (APA) is a general mechanism of transcript diversification in mammals, which has been recently linked to proliferative states and cancer. Different 3′ untranslated region (3′ UTR) isoforms interact with different RNA-binding proteins (RBPs), which modify the stability, translation, and subcellular localization of the corresponding transcripts. Although the heterogeneity of pre-mRNA 3′ end processing has been established with high-throughput approaches, the mechanisms that underlie systematic changes in 3′ UTR lengths remain to be characterized. Through a uniform analysis of a large number of 3′ end sequencing data sets, we have uncovered 18 signals, six of which are novel, whose positioning with respect to pre-mRNA cleavage sites indicates a role in pre-mRNA 3′ end processing in both mouse and human. With 3′ end sequencing we have demonstrated that the heterogeneous ribonucleoprotein C (HNRNPC), which binds the poly(U) motif whose frequency also peaks in the vicinity of polyadenylation (poly(A)) sites, has a genome-wide effect on poly(A) site usage. HNRNPC-regulated 3′ UTRs are enriched in ELAV-like RBP 1 (ELAVL1) binding sites and include those of the CD47 gene, which participate in the recently discovered mechanism of 3′ UTR–dependent protein localization (UDPL). Our study thus establishes an up-to-date, high-confidence catalog of 3′ end processing sites and poly(A) signals, and it uncovers an important role of HNRNPC in regulating 3′ end processing. It further suggests that U-rich elements mediate interactions with multiple RBPs that regulate different stages in a transcript's life cycle. PMID:27382025

Global mapping of DNA conformational flexibility on Saccharomyces cerevisiae.

PubMed

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-04-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3'UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3'-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites.
Global Mapping of DNA Conformational Flexibility on Saccharomyces cerevisiae

PubMed Central

Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

2015-01-01

In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3’UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3’-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites. PMID:25860149
The Nerium oleander aphid Aphis nerii is tolerant to a local isolate of Aphid lethal paralysis virus (ALPV).

PubMed

Dombrovsky, Aviv; Luria, Neta

2013-04-01

In a survey that was conducted during the year 2011, a local strain of Aphid lethal paralysis virus (ALPV) was identified and isolated from a wild population of Aphis nerii aphids living on Nerium oleander plants located in northern Israel. The new strain was tentatively named (ALPV-An). RNA extracted from the viral particles allowed the amplification and determination of the complete genome sequence. The virus genome is comprised of 9835 nucleotides. In a BLAST search analysis, the ALPV-An sequence showed 89 % nucleotide sequence identity with the whole genome of a South African ALPV and 96 and 94 % amino acid sequence identity with the ORF1 and ORF2 of that strain, respectively. In preliminary experiments, spray-applied, purified ALPV virions were highly pathogenic to the green peach aphid Myzus persicae; 95 % mortality was recorded 4 days post-infection. These preliminary results demonstrate the potential of ALPV for use as a biologic agent for some aphid control. Surprisingly, no visible ALPV pathogenic effects, such as morphological changes or paralysis, were observed in the A. nerii aphids infected with ALPV-An. The absence of clear ALPV symptoms in A. nerii led to the formulation of two hypotheses, which were partially examined in this study. The first hypothesis suggest that A. nerii is resistant or tolerant of ALPV, while the second hypothesis propose that ALPV-An may be a mild strain of ALPV. Currently, our results is in favor with the first hypothesis since ALPV-An is cryptic in A. nerii aphids and can be lethal for M. persicae aphids.
Compositional profile of α/β-hydrolase fold proteins in mangrove soil metagenomes: prevalence of epoxide hydrolases and haloalkane dehalogenases in oil-contaminated sites

PubMed Central

Jiménez, Diego Javier; Dini-Andreote, Francisco; Ottoni, Júlia Ronzella; de Oliveira, Valéria Maia; van Elsas, Jan Dirk; Andreote, Fernando Dini

2015-01-01

The occurrence of genes encoding biotechnologically relevant α/β-hydrolases in mangrove soil microbial communities was assessed using data obtained by whole-metagenome sequencing of four mangroves areas, denoted BrMgv01 to BrMgv04, in São Paulo, Brazil. The sequences (215 Mb in total) were filtered based on local amino acid alignments against the Lipase Engineering Database. In total, 5923 unassembled sequences were affiliated with 30 different α/β-hydrolase fold superfamilies. The most abundant predicted proteins encompassed cytosolic hydrolases (abH08; ∼ 23%), microsomal hydrolases (abH09; ∼ 12%) and Moraxella lipase-like proteins (abH04 and abH01; < 5%). Detailed analysis of the genes predicted to encode proteins of the abH08 superfamily revealed a high proportion related to epoxide hydrolases and haloalkane dehalogenases in polluted mangroves BrMgv01-02-03. This suggested selection and putative involvement in local degradation/detoxification of the pollutants. Seven sequences that were annotated as genes for putative epoxide hydrolases and five for putative haloalkane dehalogenases were found in a fosmid library generated from BrMgv02 DNA. The latter enzymes were predicted to belong to Actinobacteria, Deinococcus-Thermus, Planctomycetes and Proteobacteria. Our integrated approach thus identified 12 genes (complete and/or partial) that may encode hitherto undescribed enzymes. The low amino acid identity (< 60%) with already-described genes opens perspectives for both production in an expression host and genetic screening of metagenomes. PMID:25171437
Linking microarray reporters with protein functions.

PubMed

Gaj, Stan; van Erk, Arie; van Haaften, Rachel I M; Evelo, Chris T A

2007-09-26

The analysis of microarray experiments requires accurate and up-to-date functional annotation of the microarray reporters to optimize the interpretation of the biological processes involved. Pathway visualization tools are used to connect gene expression data with existing biological pathways by using specific database identifiers that link reporters with elements in the pathways. This paper proposes a novel method that aims to improve microarray reporter annotation by BLASTing the original reporter sequences against a species-specific EMBL subset, that was derived from and crosslinked back to the highly curated UniProt database. The resulting alignments were filtered using high quality alignment criteria and further compared with the outcome of a more traditional approach, where reporter sequences were BLASTed against EnsEMBL followed by locating the corresponding protein (UniProt) entry for the high quality hits. Combining the results of both methods resulted in successful annotation of > 58% of all reporter sequences with UniProt IDs on two commercial array platforms, increasing the amount of Incyte reporters that could be coupled to Gene Ontology terms from 32.7% to 58.3% and to a local GenMAPP pathway from 9.6% to 16.7%. For Agilent, 35.3% of the total reporters are now linked towards GO nodes and 7.1% on local pathways. Our methods increased the annotation quality of microarray reporter sequences and allowed us to visualize more reporters using pathway visualization tools. Even in cases where the original reporter annotation showed the correct description the new identifiers often allowed improved pathway and Gene Ontology linking. These methods are freely available at http://www.bigcat.unimaas.nl/public/publications/Gaj_Annotation/.
CDinFusion – Submission-Ready, On-Line Integration of Sequence and Contextual Data

PubMed Central

Hankeln, Wolfgang; Wendel, Norma Johanna; Gerken, Jan; Waldmann, Jost; Buttigieg, Pier Luigi; Kostadinov, Ivaylo; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver

2011-01-01

State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion. PMID:21935468
Biclustering as a method for RNA local multiple sequence alignment.

PubMed

Wang, Shu; Gutell, Robin R; Miranker, Daniel P

2007-12-15

Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/
Subcellular localization of transiently expressed fluorescent fusion proteins.

PubMed

Collings, David A

2013-01-01

The recent and massive expansion in plant genomics data has generated a large number of gene sequences for which two seemingly simple questions need to be answered: where do the proteins encoded by these genes localize in cells, and what do they do? One widespread approach to answering the localization question has been to use particle bombardment to transiently express unknown proteins tagged with green fluorescent protein (GFP) or its numerous derivatives. Confocal fluorescence microscopy is then used to monitor the localization of the fluorescent protein as it hitches a ride through the cell. The subcellular localization of the fusion protein, if not immediately apparent, can then be determined by comparison to localizations generated by fluorescent protein fusions to known signalling sequences and proteins, or by direct comparison with fluorescent dyes. This review aims to be a tour guide for researchers wanting to travel this hitch-hiker's path, and for reviewers and readers who wish to understand their travel reports. It will describe some of the technology available for visualizing protein localizations, and some of the experimental approaches for optimizing and confirming localizations generated by particle bombardment in onion epidermal cells, the most commonly used experimental system. As the non-conservation of signal sequences in heterologous expression systems such as onion, and consequent mis-targeting of fusion proteins, is always a potential problem, the epidermal cells of the Argenteum mutant of pea are proposed as a model system.
Feature co-localization landscape of the human genome

PubMed Central

Ng, Siu-Kin; Hu, Taobo; Long, Xi; Chan, Cheuk-Hin; Tsang, Shui-Ying; Xue, Hong

2016-01-01

Although feature co-localizations could serve as useful guide-posts to genome architecture, a comprehensive and quantitative feature co-localization map of the human genome has been lacking. Herein we show that, in contrast to the conventional bipartite division of genomic sequences into genic and inter-genic regions, pairwise co-localizations of forty-two genomic features in the twenty-two autosomes based on 50-kb to 2,000-kb sequence windows indicate a tripartite zonal architecture comprising Genic zones enriched with gene-related features and Alu-elements; Proximal zones enriched with MIR- and L2-elements, transcription-factor-binding-sites (TFBSs), and conserved-indels (CIDs); and Distal zones enriched with L1-elements. Co-localizations between single-nucleotide-polymorphisms (SNPs) and copy-number-variations (CNVs) reveal a fraction of sequence windows displaying steeply enhanced levels of SNPs, CNVs and recombination rates that point to active adaptive evolution in such pathways as immune response, sensory perceptions, and cognition. The strongest positive co-localization observed between TFBSs and CIDs suggests a regulatory role of CIDs in cooperation with TFBSs. The positive co-localizations of cancer somatic CNVs (CNVT) with all Proximal zone and most Genic zone features, in contrast to the distinctly more restricted co-localizations exhibited by germline CNVs (CNVG), reveal disparate distributions of CNVTs and CNVGs indicative of dissimilarity in their underlying mechanisms. PMID:26854351
Assessment of clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing.

PubMed

Momeni, Stephanie S; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A; Childers, Noel K

2015-12-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African-American children was examined using MLST. Serotype and the presence of collagen-binding proteins (CBPs) encoded by cnm/cbm were also assessed. One-hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using start2 and mega. Thirty-four sequence types were identified, of which 27 were unique to this population. Seventy-five per cent of the isolates clustered into 16 clonal groups. The serotypes observed were c (n = 84), e (n = 3), and k (n = 11). The prevalence of S. mutans isolates of serotype k was notably high, at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized population studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study, is higher than reported in most populations and is the first report of S. mutans serotype k in a United States population. © 2015 Eur J Oral Sci.
[Analysis of the molecular characteristics and cloning of full-length coding sequence of interleukin-2 in tree shrews].

PubMed

Huang, Xiao-Yan; Li, Ming-Li; Xu, Juan; Gao, Yue-Dong; Wang, Wen-Guang; Yin, An-Guo; Li, Xiao-Fei; Sun, Xiao-Mei; Xia, Xue-Shan; Dai, Jie-Jie

2013-04-01

While the tree shrew (Tupaia belangeri chinensis) is an excellent animal model for studying the mechanisms of human diseases, but few studies examine interleukin-2 (IL-2), an important immune factor in disease model evaluation. In this study, a 465 bp of the full-length IL-2 cDNA encoding sequence was cloned from the RNA of tree shrew spleen lymphocytes, which were then cultivated and stimulated with ConA (concanavalin). Clustal W 2.0 was used to compare and analyze the sequence and molecular characteristics, and establish the similarity of the overall structure of IL-2 between tree shrews and other mammals. The homology of the IL-2 nucleotide sequence between tree shrews and humans was 93%, and the amino acid homology was 80%. The phylogenetic tree results, derived through the Neighbour-Joining method using MEGA5.0, indicated a close genetic relationship between tree shrews, Homo sapiens, and Macaca mulatta. The three-dimensional structure analysis showed that the surface charges in most regions of tree shrew IL-2 were similar to between tree shrews and humans; however, the N-glycosylation sites and local structures were different, which may affect antibody binding. These results provide a fundamental basis for the future study of IL-2 monoclonal antibody in tree shrews, thereby improving their utility as a model.
Sequence Evolution and Expression Regulation of Stress-Responsive Genes in Natural Populations of Wild Tomato

PubMed Central

Fischer, Iris; Steige, Kim A.; Stephan, Wolfgang; Mboup, Mamadou

2013-01-01

The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives. PMID:24205149
Molecular cloning, characterization, and immunolocalization of two lactate dehydrogenase homologous genes from Taenia solium.

PubMed

Du, Wuying; Hu, Fengyu; Yang, Yabo; Hu, Dong; Hu, Xuchu; Yu, Xinbing; Xu, Jin; Dai, Jialin; Liao, Xinjiang; Huang, Jiang

2011-09-01

Two novel genes encoding lactate dehydrogenase A (LDHA) and B (LDHB) homologues, respectively, were identified from the cDNA libraries of adult Taenia solium (T. solium). The two deduced amino acid sequences both show more than 50% identity to the homologues for Danio rerio, Xenopus laevis, Schistosoma japonicum, Sus scrofa, Homo sapiens, et al. The identity of the amino acid sequence between TsLDHA and TsLDHB is 57.4%, and that of the nucleotide sequence is 61.5%. Recombinant TsLDHA homologue (rTsLDHA) and TsLDHB homologue (rTsLDHB) were expressed in Escherichia coli BL21/DE3 and purified. Though there were some differences in the sequence, the two LDH isozyme homologues show similarity in the conserved LDH domain, topological structure, primary immunological traits, localization on the tegument of T. solium adult, and partial physicochemical properties. The linear B-cell epitope analysis of TsLDHA and TsLDHB discovered a TsLDHA specific epitope. The purified rTsLDHA and rTsLDHB could be recognized by rat immuno-sera, serum from swine, or a patient infected with T. solium, respectively, but Western blot analysis showed cross-reactions, not only between these two LDH members but also with other common human tapeworms or helminths. The results suggested that the two LDH homologues are similar in the characteristics of LDH family, and they are not specific antigens for immunodiagnosis.
Facies analysis and sequence stratigraphic framework of upper Campanian strata (Neslen and Mount Garfield formations, Bluecastle Tongue of the Castlegate sandstone, and Mancos shale), Eastern Book cliffs, Colorado and Utah

USGS Publications Warehouse

Kirschbaum, Mark A.; Hettinger, Robert D.

2004-01-01

Facies and sequence-stratigraphic analysis identifies six high-resolution sequences within upper Campanian strata across about 120 miles of the Book Cliffs in western Colorado and eastern Utah. The six sequences are named after prominent sandstone units and include, in ascending order, upper Sego sequence, Neslen sequence, Corcoran sequence, Buck Canyon/lower Cozzette sequence, upper Cozzette sequence, and Cozzette/Rollins sequence. A seventh sequence, the Bluecastle sequence, is present in the extreme western part of the study area. Facies analysis documents deepening- and shallowing- upward successions, parasequence stacking patterns, downlap in subsurface cross sections, facies dislocations, basinward shifts in facies, and truncation of strata.All six sequences display major incision into shoreface deposits of the Sego Sandstone and sandstones of the Corcoran and Cozzette Members of the Mount Garfield Formation. The incised surfaces represent sequence-boundary unconformities that allowed bypass of sediment to lowstand shorelines that are either attached to the older highstand shorelines or are detached from the older highstand shorelines and located southeast of the main study area. The sequence boundary unconformities represent valley incisions that were cut during successive lowstands of relative sea level. The overlying valley-fill deposits generally consist of tidally influenced strata deposited during an overall base level rise. Transgressive surfaces can be traced or projected over, or locally into, estuarine deposits above and landward of their associated shoreface deposits. Maximum flooding surfaces can be traced or projected landward from offshore strata into, or above, coastal-plain deposits. With the exception of the Cozzette/Rollins sequence, the majority of coal-bearing coastal-plain strata was deposited before maximum flooding and is therefore within the transgressive systems tracts. Maximum flooding was followed by strong progradation of parasequences and low preservation potential of coastal-plain strata within the highstand systems tract. The large incised valleys, lack of transgressive retrogradational parasequences, strong progradational nature of highstand parasequences, and low preservation of coastal-plain strata in the highstand systems tracts argue for relatively low accommodation space during deposition of the Sego, Corcoran, and Cozzette sequences. The Buck Canyon/Cozzette and Cozzette/Rollins sequences contrast with other sequences in that the preservation of retrogradational parasequences and the development of large estuaries coincident with maximum flooding indicate a relative increase in accommodation space during deposition of these strata. Following maximum flooding, the Buck Canyon/Cozzette sequence follows the pattern of the other sequences, but the Cozzette/Rollins sequence exhibits a contrasting offlapping pattern with development of offshore clinoforms that downlap and eventually parallel its maximum flooding surface. This highstand systems tract preserves a thick coal-bearing section where the Rollins Sandstone Member of the Mount Garfield Formation parasequences prograde out of the study area, stepping up as much as 800 ft stratigraphically over a distance of about 90 miles. This progradational stacking pattern indicates a higher accommodation space and increased sedimentation rate compared to the previous sequences.
A Hidden Markov Model Approach for Simultaneously Estimating Local Ancestry and Admixture Time Using Next Generation Sequence Data in Samples of Arbitrary Ploidy

PubMed Central

Nielsen, Rasmus

2017-01-01

Admixture—the mixing of genomes from divergent populations—is increasingly appreciated as a central process in evolution. To characterize and quantify patterns of admixture across the genome, a number of methods have been developed for local ancestry inference. However, existing approaches have a number of shortcomings. First, all local ancestry inference methods require some prior assumption about the expected ancestry tract lengths. Second, existing methods generally require genotypes, which is not feasible to obtain for many next-generation sequencing projects. Third, many methods assume samples are diploid, however a wide variety of sequencing applications will fail to meet this assumption. To address these issues, we introduce a novel hidden Markov model for estimating local ancestry that models the read pileup data, rather than genotypes, is generalized to arbitrary ploidy, and can estimate the time since admixture during local ancestry inference. We demonstrate that our method can simultaneously estimate the time since admixture and local ancestry with good accuracy, and that it performs well on samples of high ploidy—i.e. 100 or more chromosomes. As this method is very general, we expect it will be useful for local ancestry inference in a wider variety of populations than what previously has been possible. We then applied our method to pooled sequencing data derived from populations of Drosophila melanogaster on an ancestry cline on the east coast of North America. We find that regions of local recombination rates are negatively correlated with the proportion of African ancestry, suggesting that selection against foreign ancestry is the least efficient in low recombination regions. Finally we show that clinal outlier loci are enriched for genes associated with gene regulatory functions, consistent with a role of regulatory evolution in ecological adaptation of admixed D. melanogaster populations. Our results illustrate the potential of local ancestry inference for elucidating fundamental evolutionary processes. PMID:28045893
In silico cloning, expression of Rieske-like apoprotein gene and protein subcellular localization in the Pacific oyster, Crassostrea gigas.

PubMed

He, Xiaocui; Zhang, Yang; Yu, Ziniu

2010-10-01

Rieske protein gene in the Pacific oyster Crassostrea gigas was obtained by in silico cloning for the first time, and its expression profiles and subcellular localization were determined, respectively. The full-length cDNA of Cgisp is 985 bp in length and contains a 5'- and 3'-untranslated regions of 35 and 161 bp, respectively, with an open reading frame of 786 bp encoding a protein of 262 amino acids. The predicted molecular weight of 30 kDa of Cgisp protein was verified by prokaryotic expression. Conserved Rieske [2Fe-2S] cluster binding sites and highly matched-pair tertiary structure with 3CWB_E (Gallus gallus) were revealed by homologous analysis and molecular modeling. Eleven putative SNP sites and two conserved hexapeptide sequences, box I (THLGC) and II (PCHGS), were detected by multiple alignments. Real-time PCR analysis showed that Cgisp is expressed in a wide range of tissues, with adductor muscle exhibiting the top expression level, suggesting its biological function of energy transduction. The GFP tagging Cgisp indicated a mitochondrial localization, further confirming its physiological function.
The alpha-fetoprotein third domain receptor binding fragment: in search of scavenger and associated receptor targets.

PubMed

Mizejewski, G J

2015-01-01

Recent studies have demonstrated that the carboxyterminal third domain of alpha-fetoprotein (AFP-CD) binds with various ligands and receptors. Reports within the last decade have established that AFP-CD contains a large fragment of amino acids that interact with several different receptor types. Using computer software specifically designed to identify protein-to-protein interaction at amino acid sequence docking sites, the computer searches identified several types of scavenger-associated receptors and their amino acid sequence locations on the AFP-CD polypeptide chain. The scavenger receptors (SRs) identified were CD36, CD163, Stabilin, SSC5D, SRB1 and SREC; the SR-associated receptors included the mannose, low-density lipoprotein receptors, the asialoglycoprotein receptor, and the receptor for advanced glycation endproducts (RAGE). Interestingly, some SR interaction sites were localized on the AFP-derived Growth Inhibitory Peptide (GIP) segment at amino acids #480-500. Following the detection studies, a structural subdomain analysis of both the receptor and the AFP-CD revealed the presence of epidermal growth factor (EGF) repeats, extracellular matrix-like protein regions, amino acid-rich motifs and dimerization subdomains. For the first time, it was reported that EGF-like sequence repeats were identified on each of the three domains of AFP. Thereafter, the localization of receptors on specific cell types were reviewed and their functions were discussed.
Genome-wide uniformity of human ‘open’ pre-initiation complexes

PubMed Central

Lai, William K.M.; Pugh, B. Franklin

2017-01-01

Transcription of protein-coding and noncoding DNA occurs pervasively throughout the mammalian genome. Their sites of initiation are generally inferred from transcript 5′ ends and are thought to be either locally dispersed or focused. How these two modes of initiation relate is unclear. Here, we apply permanganate treatment and chromatin immunoprecipitation (PIP-seq) of initiation factors to identify the precise location of melted DNA separately associated with the preinitiation complex (PIC) and the adjacent paused complex (PC). This approach revealed the two known modes of transcription initiation. However, in contrast to prevailing views, they co-occurred within the same promoter region: initiation originating from a focused PIC, and broad nucleosome-linked initiation. PIP-seq allowed transcriptional orientation of Pol II to be determined, which may be useful near promoters where sufficient sense/anti-sense transcript mapping information is lacking. PIP-seq detected divergently oriented Pol II at both coding and noncoding promoters, as well as at enhancers. Their occupancy levels were not necessarily coupled in the two orientations. DNA sequence and shape analysis of initiation complex sites suggest that both sequence and shape contribute to specificity, but in a context-restricted manner. That is, initiation sites have the locally “best” initiator (INR) sequence and/or shape. These findings reveal a common core to pervasive Pol II initiation throughout the human genome. PMID:27927716
Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onda, M.; Kudo, S.; Fukuda, M.

Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

PubMed

Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

2007-01-01

The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence.

The use of additive and subtractive approaches to examine the nuclear localization sequence of the polyomavirus major capsid protein VP1

NASA Technical Reports Server (NTRS)

Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

1992-01-01

A nuclear localization signal (NLS) has been identified in the N-terminal (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) amino acid sequence of the polyomavirus major capsid protein VP1. The importance of this amino acid sequence for nuclear transport of VP1 protein was demonstrated by a genetic "subtractive" study using the constructs pSG5VP1 (full-length VP1) and pSG5 delta 5'VP1 (truncated VP1, lacking amino acids Ala1-Cys11). These constructs were used to transfect COS-7 cells, and expression and intracellular localization of the VP1 protein was visualized by indirect immunofluorescence. These studies revealed that the full-length VP1 was expressed and localized in the nucleus, while the truncated VP1 protein was localized in the cytoplasm and not transported to the nucleus. These findings were substantiated by an "additive" approach using FITC-labeled conjugates of synthetic peptides homologous to the NLS of VP1 cross-linked to bovine serum albumin or immunoglobulin G. Both conjugates localized in the nucleus after microinjection into the cytoplasm of 3T6 cells. The importance of individual amino acids found in the basic sequence (Lys3-Arg-Lys5) of the NLS was also investigated. This was accomplished by synthesizing three additional peptides in which lysine-3 was substituted with threonine, arginine-4 was substituted with threonine, or lysine-5 was substituted with threonine. It was found that lysine-3 was crucial for nuclear transport, since substitution of this amino acid with threonine prevented nuclear localization of the microinjected, FITC-labeled conjugate.
Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30

PubMed Central

Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

2016-01-01

Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban–rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase–polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang. PMID:27646838
Laboratory diagnosis and genetic analysis of a family clustering outbreak of aseptic meningitis due to echovirus 30.

PubMed

Zheng, Shufa; Ye, Hongyan; Yan, Juying; Xie, Guoliang; Cui, Dawei; Yu, Fei; Wang, Yiyin; Yang, Xianzhi; Zhou, Fangman; Zhang, Yanjun; Tian, Xueli; Chen, Yu

2016-09-01

Echovirus 30 (E30) is a major pathogen associated with aseptic meningitis. In the summer of 2014, a family clustering aseptic meningitis outbreak occurred in urban-rural fringe of Ningbo city in Zhejiang Province in China. To identify the etiologic agent, specimens were tested by cell culture and reverse transcriptase-polymerase chain reaction. Pathogenic examination confirmed that the outbreak is caused by E30. The first case is a 6-year-old child, who studied in kindergarten in local, suffered from headache and fever. Same symptoms appeared in his parents, aunts, and other six relatives continuously. Meanwhile, vomiting occurred in majority of the patients and diarrhea in parts of them. White blood cells in cerebrospinal fluid (CSF) exceeded normal range in all patients. Protein levels in CSF were above normal range in half of the patients. Glucose levels in CSF were within normal range in all patients. We isolated six strains E30 in the stool specimens of patients, and carried out sequencing analysis to VP1 region. Sequencing results showed that 100% sequence identity was seen in both nucleotide and amino acid levels. Phylogenetic analysis discovered that isolate in this study was grouped into sublineage D2 together with sequences isolated from other areas of China in the 2000s and 2010s. Our study is the first family clustering outbreak of aseptic meningitis caused by E30 in Zhejiang Province in China. It is essential to establish an enterovirus molecular surveillance system in China to prevent mass outbreaks in Zhejiang.
Highly multiplexed subcellular RNA sequencing in situ

PubMed Central

Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Yang, Joyce L.; Terry, Richard; Jeanty, Sauveur S. F.; Li, Chao; Amamoto, Ryoji; Peters, Derek T.; Turczyk, Brian M.; Marblestone, Adam H.; Inverso, Samuel A.; Bernard, Amy; Mali, Prashant; Rios, Xavier; Aach, John; Church, George M.

2014-01-01

Understanding the spatial organization of gene expression with single nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked cDNA amplicons are sequenced within a biological sample. Using 30-base reads from 8,742 genes in situ, we examined RNA expression and localization in human primary fibroblasts using a simulated wound healing assay. FISSEQ is compatible with tissue sections and whole mount embryos, and reduces the limitations of optical resolution and noisy signals on single molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ. PMID:24578530
Clinical applicability and cost of a 46-gene panel for genomic analysis of solid tumours: Retrospective validation and prospective audit in the UK National Health Service

PubMed Central

Kaur, Kulvinder; Camps, Carme; Kaisaki, Pamela; Gupta, Avinash; Talbot, Denis; Middleton, Mark; Henderson, Shirley; Cutts, Anthony; Vavoulis, Dimitrios V.; Housby, Nick; Taylor, Jenny C.; Schuh, Anna

2017-01-01

Background Single gene tests to predict whether cancers respond to specific targeted therapies are performed increasingly often. Advances in sequencing technology, collectively referred to as next generation sequencing (NGS), mean the entire cancer genome or parts of it can now be sequenced at speed with increased depth and sensitivity. However, translation of NGS into routine cancer care has been slow. Healthcare stakeholders are unclear about the clinical utility of NGS and are concerned it could be an expensive addition to cancer diagnostics, rather than an affordable alternative to single gene testing. Methods and findings We validated a 46-gene hotspot cancer panel assay allowing multiple gene testing from small diagnostic biopsies. From 1 January 2013 to 31 December 2013, solid tumour samples (including non-small-cell lung carcinoma [NSCLC], colorectal carcinoma, and melanoma) were sequenced in the context of the UK National Health Service from 351 consecutively submitted prospective cases for which treating clinicians thought the patient had potential to benefit from more extensive genetic analysis. Following histological assessment, tumour-rich regions of formalin-fixed paraffin-embedded (FFPE) sections underwent macrodissection, DNA extraction, NGS, and analysis using a pipeline centred on Torrent Suite software. With a median turnaround time of seven working days, an integrated clinical report was produced indicating the variants detected, including those with potential diagnostic, prognostic, therapeutic, or clinical trial entry implications. Accompanying phenotypic data were collected, and a detailed cost analysis of the panel compared with single gene testing was undertaken to assess affordability for routine patient care. Panel sequencing was successful for 97% (342/351) of tumour samples in the prospective cohort and showed 100% concordance with known mutations (detected using cobas assays). At least one mutation was identified in 87% (296/342) of tumours. A locally actionable mutation (i.e., available targeted treatment or clinical trial) was identified in 122/351 patients (35%). Forty patients received targeted treatment, in 22/40 (55%) cases solely due to use of the panel. Examination of published data on the potential efficacy of targeted therapies showed theoretically actionable mutations (i.e., mutations for which targeted treatment was potentially appropriate) in 66% (71/107) and 39% (41/105) of melanoma and NSCLC patients, respectively. At a cost of £339 (US$449) per patient, the panel was less expensive locally than performing more than two or three single gene tests. Study limitations include the use of FFPE samples, which do not always provide high-quality DNA, and the use of “real world” data: submission of cases for sequencing did not always follow clinical guidelines, meaning that when mutations were detected, patients were not always eligible for targeted treatments on clinical grounds. Conclusions This study demonstrates that more extensive tumour sequencing can identify mutations that could improve clinical decision-making in routine cancer care, potentially improving patient outcomes, at an affordable level for healthcare providers. PMID:28196074
Analysis of a developmentally regulated nuclear localization signal in Xenopus

PubMed Central

1992-01-01

The 289 residue nuclear oncoprotein encoded by the adenovirus 5 Ela gene contains two peptide sequences that behave as nuclear localization signals (NLS). One signal, located at the carboxy terminus, is like many other known NLSs in that it consists of a short stretch of basic residues (KRPRP) and is constitutively active in cells. The second signal resides within an internal 45 residue region of E1a that contains few basic residues or sequences that resemble other known NLSs. Moreover, this internal signal functions in injected Xenopus oocytes, but not in transfected Xenopus A6 cells, suggesting that it could be regulated developmentally (Slavicek et al. 1989. J. Virol. 63:4047). In this study, we show that the activity of this signal is sensitive to ATP depletion in vivo, efficiently directs the import of a 50 kD fusion protein and can compete with the E1a carboxy-terminal NLS for nuclear import. In addition, we have delineated the precise amino acid residues that comprise the second E1a NLS, and have assessed its utilization during Xenopus embryogenesis. Using amino acid deletion and substitution analyses, we show that the signal consists of the sequence FV(X)7-20MXSLXYM(X)4MF. By expressing in Xenopus embryos a truncated E1a protein that contains only the second NLS and by monitoring its cytoplasmic/nuclear distribution during development with indirect immunofluorescence, we find that the second NLS is utilized up to the early neurula stage. In addition, there appears to be a hierarchy among the embryonic germ layers as to when the second NLS becomes nonfunctional. For this reason, we refer to this NLS as the developmentally regulated nuclear localization signal (drNLS). The implications of these findings for early development are discussed. PMID:1387407
Use of diffusion-weighted MRI to modify radiosurgery planning in brain metastases may reduce local recurrence.

PubMed

Zakaria, Rasheed; Pomschar, Andreas; Jenkinson, Michael D; Tonn, Jörg-Christian; Belka, Claus; Ertl-Wagner, Birgit; Niyazi, Maximilian

2017-02-01

Stereotactic radiosurgery (SRS) is an effective and well tolerated treatment for selected brain metastases; however, local recurrence still occurs. We investigated the use of diffusion weighted MRI (DWI) as an adjunct for SRS treatment planning in brain metastases. Seventeen consecutive patients undergoing complete surgical resection of a solitary brain metastasis underwent image analysis retrospectively. SRS treatment plans were generated based on standard 3D post-contrast T1-weighted sequences at 1.5T and then separately using apparent diffusion coefficient (ADC) maps in a blinded fashion. Control scans immediately post operation confirmed complete tumour resection. Treatment plans were compared to one another and with volume of local recurrence at progression quantitatively and qualitatively by calculating the conformity index (CI), the overlapping volume as a proportion of the total combined volume, where 1 = identical plans and 0 = no conformation whatsoever. Gross tumour volumes (GTVs) using ADC and post-contrast T1-weighted sequences were quantitatively the same (related samples Wilcoxon signed rank test = -0.45, p = 0.653) but showed differing conformations (CI 0.53, p < 0.001). The diffusion treatment volume (DTV) obtained by combining the two target volumes was significantly greater than the treatment volume based on post contrast T1-weighted MRI alone, both quantitatively (median 13.65 vs. 9.52 cm 3 , related samples Wilcoxon signed rank test p < 0.001) and qualitatively (CI 0.74, p = 0.001). This DTV covered a greater volume of subsequent tumour recurrence than the standard plan (median 3.53 cm 3 vs. 3.84 cm 3 , p = 0.002). ADC maps may be a useful tool in addition to the standard post-contrast T1-weighted sequence used for SRS planning.
Niche specialization of terrestrial archaeal ammonia oxidizers

PubMed Central

Gubry-Rangin, Cécile; Hai, Brigitte; Quince, Christopher; Engel, Marion; Thomson, Bruce C.; James, Phillip; Schloter, Michael; Griffiths, Robert I.; Prosser, James I.; Nicol, Graeme W.

2011-01-01

Soil pH is a major determinant of microbial ecosystem processes and potentially a major driver of evolution, adaptation, and diversity of ammonia oxidizers, which control soil nitrification. Archaea are major components of soil microbial communities and contribute significantly to ammonia oxidation in some soils. To determine whether pH drives evolutionary adaptation and community structure of soil archaeal ammonia oxidizers, sequences of amoA, a key functional gene of ammonia oxidation, were examined in soils at global, regional, and local scales. Globally distributed database sequences clustered into 18 well-supported phylogenetic lineages that dominated specific soil pH ranges classified as acidic (pH <5), acido-neutral (5≤ pH <7), or alkalinophilic (pH ≥7). To determine whether patterns were reproduced at regional and local scales, amoA gene fragments were amplified from DNA extracted from 47 soils in the United Kingdom (pH 3.5–8.7), including a pH-gradient formed by seven soils at a single site (pH 4.5–7.5). High-throughput sequencing and analysis of amoA gene fragments identified an additional, previously undiscovered phylogenetic lineage and revealed similar pH-associated distribution patterns at global, regional, and local scales, which were most evident for the five most abundant clusters. Archaeal amoA abundance and diversity increased with soil pH, which was the only physicochemical characteristic measured that significantly influenced community structure. These results suggest evolution based on specific adaptations to soil pH and niche specialization, resulting in a global distribution of archaeal lineages that have important consequences for soil ecosystem function and nitrogen cycling. PMID:22158986
In the Absence of Writhe, DNA Relieves Torsional Stress with Localized, Sequence-Dependent Structural Failure to Preserve B-form

DOE Office of Scientific and Technical Information (OSTI.GOV)

Randall, Graham L.; Zechiedrich, E. L.; Pettitt, Bernard M.

2009-09-01

To understand how underwinding and overwinding the DNA helix affects its structure, we simulated 19 independent DNA systems with fixed degrees of twist using molecular dynamics in a system that does not allow writhe. Underwinding DNA induced spontaneous, sequence-dependent base flipping and local denaturation, while overwinding DNA induced the formation of Pauling-like DNA (P-DNA). The winding resulted in a bimodal state simultaneously including local structural failure and B-form DNA for both underwinding and extreme overwinding. Our simulations suggest that base flipping and local denaturation may provide a landscape influencing protein recognition of DNA sequence to affect, for examples, replication, transcriptionmore » and recombination. Additionally, our findings help explain results from singlemolecule experiments and demonstrate that elastic rod models are strictly valid on average only for unstressed or overwound DNA up to P-DNA formation. Finally, our data support a model in which base flipping can result from torsional stress.« less
Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing.

PubMed

Zhang, Jianjun; Fujimoto, Junya; Zhang, Jianhua; Wedge, David C; Song, Xingzhi; Zhang, Jiexin; Seth, Sahil; Chow, Chi-Wan; Cao, Yu; Gumbs, Curtis; Gold, Kathryn A; Kalhor, Neda; Little, Latasha; Mahadeshwar, Harshad; Moran, Cesar; Protopopov, Alexei; Sun, Huandong; Tang, Jiabin; Wu, Xifeng; Ye, Yuanqing; William, William N; Lee, J Jack; Heymach, John V; Hong, Waun Ki; Swisher, Stephen; Wistuba, Ignacio I; Futreal, P Andrew

2014-10-10

Cancers are composed of populations of cells with distinct molecular and phenotypic features, a phenomenon termed intratumor heterogeneity (ITH). ITH in lung cancers has not been well studied. We applied multiregion whole-exome sequencing (WES) on 11 localized lung adenocarcinomas. All tumors showed clear evidence of ITH. On average, 76% of all mutations and 20 out of 21 known cancer gene mutations were identified in all regions of individual tumors, which suggested that single-region sequencing may be adequate to identify the majority of known cancer gene mutations in localized lung adenocarcinomas. With a median follow-up of 21 months after surgery, three patients have relapsed, and all three patients had significantly larger fractions of subclonal mutations in their primary tumors than patients without relapse. These data indicate that a larger subclonal mutation fraction may be associated with increased likelihood of postsurgical relapse in patients with localized lung adenocarcinomas. Copyright © 2014, American Association for the Advancement of Science.
Whole Genome Sequence Typing to Investigate the Apophysomyces Outbreak following a Tornado in Joplin, Missouri, 2011

PubMed Central

Etienne, Kizee A.; Gillece, John; Hilsabeck, Remy; Schupp, Jim M.; Colman, Rebecca; Lockhart, Shawn R.; Gade, Lalitha; Thompson, Elizabeth H.; Sutton, Deanna A.; Neblett-Fanfair, Robyn; Park, Benjamin J.; Turabelidze, George; Keim, Paul; Brandt, Mary E.; Deak, Eszter; Engelthaler, David M.

2012-01-01

Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces. PMID:23209631
Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

PubMed

Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

2012-01-01

Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces.
Multilocus sequence analysis reveals extensive genetic variety within Tenacibaculum spp. associated with ulcers in sea-farmed fish in Norway.

PubMed

Olsen, Anne Berit; Gulla, Snorre; Steinum, Terje; Colquhoun, Duncan J; Nilsen, Hanne K; Duchaud, Eric

2017-06-01

Skin ulcer development in sea-reared salmonids, commonly associated with Tenacibaculum spp., is a significant fish welfare- and economical problem in Norwegian aquaculture. A collection of 89 Tenacibaculum isolates was subjected to multilocus sequence analysis (MLSA). The isolates were retrieved from outbreaks of clinical disease in farms spread along the Norwegian coast line from seven different fish species over a period of 19 years. MLSA analysis reveals considerable genetic diversity, but allows identification of four main clades. One clade encompasses isolates belonging to the species T. dicentrarchi, whereas three clades encompass bacteria that likely represent novel, as yet undescribed species. The study identified T. maritimum in lumpsucker, T. ovolyticum in halibut, and has extended the host and geographic range for T. soleae, isolated from wrasse. The overall lack of clonality and host specificity, with some indication of geographical range restriction argue for local epidemics involving multiple strains. The diversity of Tenacibaculum isolates from fish displaying ulcerative disease may complicate vaccine development. Copyright © 2017 Elsevier B.V. All rights reserved.
Identification and Characterization of Sites Where Persistent Atrial Fibrillation Is Terminated by Localized Ablation.

PubMed

Zaman, Junaid A B; Sauer, William H; Alhusseini, Mahmood I; Baykaner, Tina; Borne, Ryan T; Kowalewski, Christopher A B; Busch, Sonia; Zei, Paul C; Park, Shirley; Viswanathan, Mohan N; Wang, Paul J; Brachmann, Johannes; Krummen, David E; Miller, John M; Rappel, Wouter Jan; Narayan, Sanjiv M; Peters, Nicholas S

2018-01-01

The mechanisms by which persistent atrial fibrillation (AF) terminates via localized ablation are not well understood. To address the hypothesis that sites where localized ablation terminates persistent AF have characteristics identifiable with activation mapping during AF, we systematically examined activation patterns acquired only in cases of unequivocal termination by ablation. We recruited 57 patients with persistent AF undergoing ablation, in whom localized ablation terminated AF to sinus rhythm or organized tachycardia. For each site, we performed an offline analysis of unprocessed unipolar electrograms collected during AF from multipolar basket catheters using the maximum -dV/dt assignment to construct isochronal activation maps for multiple cycles. Additional computational modeling and phase analysis were used to study mechanisms of map variability. At all sites of AF termination, localized repetitive activation patterns were observed. Partial rotational circuits were observed in 26 of 57 (46%) cases, focal patterns in 19 of 57 (33%), and complete rotational activity in 12 of 57 (21%) cases. In computer simulations, incomplete segments of partial rotations coincided with areas of slow conduction characterized by complex, multicomponent electrograms, and variations in assigning activation times at such sites substantially altered mapped mechanisms. Local activation mapping at sites of termination of persistent AF showed repetitive patterns of rotational or focal activity. In computer simulations, complete rotational activation sequence was observed but was sensitive to assignment of activation timing particularly in segments of slow conduction. The observed phenomena of repetitive localized activation and the mechanism by which local ablation terminates putative AF drivers require further investigation. © 2018 American Heart Association, Inc.
Structure of the human gene encoding the protein repair L-isoaspartyl (D-aspartyl) O-methyltransferase.

PubMed

DeVry, C G; Tsai, W; Clarke, S

1996-11-15

The protein L-isoaspartyl/D-aspartyl O-methyltransferase (EC 2.1.1.77) catalyzes the first step in the repair of proteins damaged in the aging process by isomerization or racemization reactions at aspartyl and asparaginyl residues. A single gene has been localized to human chromosome 6 and multiple transcripts arising through alternative splicing have been identified. Restriction enzyme mapping, subcloning, and DNA sequence analysis of three overlapping clones from a human genomic library in bacteriophage P1 indicate that the gene spans approximately 60 kb and is composed of 8 exons interrupted by 7 introns. Analysis of intron/exon splice junctions reveals that all of the donor and acceptor splice sites are in agreement with the mammalian consensus splicing sequence. Determination of transcription initiation sites by primer extension analysis of poly(A)+ mRNA from human brain identifies multiple start sites, with a major site 159 nucleotides upstream from the ATG start codon. Sequence analysis of the 5'-untranslated region demonstrates several potential cis-acting DNA elements including SP1, ETF, AP1, AP2, ARE, XRE, CREB, MED-1, and half-palindromic ERE motifs. The promoter of this methyltransferase gene lacks an identifiable TATA box but is characterized by a CpG island which begins approximately 723 nucleotides upstream of the major transcriptional start site and extends through exon 1 and into the first intron. These features are characteristic of housekeeping genes and are consistent with the wide tissue distribution observed for this methyltransferase activity.
Streptococcus pneumoniae PstS production is phosphate responsive and enhanced during growth in the murine peritoneal cavity

NASA Technical Reports Server (NTRS)

Orihuela, C. J.; Mills, J.; Robb, C. W.; Wilson, C. J.; Watson, D. A.; Niesel, D. W.

2001-01-01

Differential display-PCR (DDPCR) was used to identify a Streptococcus pneumoniae gene with enhanced transcription during growth in the murine peritoneal cavity. Northern dot blot analysis and comparative densitometry confirmed a 1.8-fold increase in expression of the encoded sequence following murine peritoneal culture (MPC) versus laboratory culture or control culture (CC). Sequencing and basic local alignment search tool analysis identified the DDPCR fragment as pstS, the phosphate-binding protein of a high-affinity phosphate uptake system. PCR amplification of the complete pstS gene followed by restriction analysis and sequencing suggests a high level of conservation between strains and serotypes. Quantitative immunodot blotting using antiserum to recombinant PstS (rPstS) demonstrated an approximately twofold increase in PstS production during MPC from that during CCs, a finding consistent with the low levels of phosphate observed in the peritoneum. Moreover, immunodot blot and Northern analysis demonstrated phosphate-dependent production of PstS in six of seven strains examined. These results identify pstS expression as responsive to the MPC environment and extracellular phosphate concentrations. Presently, it remains unclear if phosphate concentrations in vivo contribute to the regulation of pstS. Finally, polyclonal antiserum to rPstS did not inhibit growth of the pneumococcus in vitro, suggesting that antibodies do not block phosphate uptake; moreover, vaccination of mice with rPstS did not protect against intraperitoneal challenge as assessed by the 50% lethal dose.
Memetic algorithms for de novo motif-finding in biomedical sequences.

PubMed

Bi, Chengpeng

2012-09-01

The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.
The complete mitochondrial genome of an 11,450-year-old aurochsen (Bos primigenius) from Central Italy.

PubMed

Lari, Martina; Rizzi, Ermanno; Mona, Stefano; Corti, Giorgio; Catalano, Giulio; Chen, Kefei; Vernesi, Cristiano; Larson, Greger; Boscato, Paolo; De Bellis, Gianluca; Cooper, Alan; Caramelli, David; Bertorelle, Giorgio

2011-01-31

Bos primigenius, the aurochs, is the wild ancestor of modern cattle breeds and was formerly widespread across Eurasia and northern Africa. After a progressive decline, the species became extinct in 1627. The origin of modern taurine breeds in Europe is debated. Archaeological and early genetic evidence point to a single Near Eastern origin and a subsequent spread during the diffusion of herding and farming. More recent genetic data are instead compatible with local domestication events or at least some level of local introgression from the aurochs. Here we present the analysis of the complete mitochondrial genome of a pre-Neolithic Italian aurochs. In this study, we applied a combined strategy employing both multiplex PCR amplifications and 454 pyrosequencing technology to sequence the complete mitochondrial genome of an 11,450-year-old aurochs specimen from Central Italy. Phylogenetic analysis of the aurochs mtDNA genome supports the conclusions from previous studies of short mtDNA fragments--namely that Italian aurochsen were genetically very similar to modern cattle breeds, but highly divergent from the North-Central European aurochsen. Complete mitochondrial genome sequences are now available for several modern cattle and two pre-Neolithic mtDNA genomes from very different geographic areas. These data suggest that previously identified sub-groups within the widespread modern cattle mitochondrial T clade are polyphyletic, and they support the hypothesis that modern European breeds have multiple geographic origins.
Mapping of the serotonin 5-HT{sub 1D{alpha}} autoreceptor gene (HTR1D) on chromosome 1 using a silent polymorphism in the coding region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ozaki, N.; Lappalainen, J.; Linnoila, M.

Serotonin (5-HT){sub ID} receptors are 5-HT release-regulating autoreceptors in the human brain. Abnormalities in brain 5-HT function have been hypothesized in the pathophysiology of various psychiatric disorders, including obsessive-compulsive disorder, autism, mood disorders, eating disorders, impulsive violent behavior, and alcoholism. Thus, mutations occurring in 5-HT autoreceptors may cause or increase the vulnerability to any of these conditions. 5-HT{sub 1D{alpha}} and 5-HT{sub 1D{Beta}} subtypes have been previously localized to chromosomes 1p36.3-p34.3 and 6q13, respectively, using rodent-human hybrids and in situ localization. In this communication, we report the detection of a 5-HT{sub 1D{alpha}} receptor gene polymorphism by single strand conformation polymorphism (SSCP)more » analysis of the coding sequence. The polymorphism was used for fine scale linkage mapping of 5-HT{sub 1D{alpha}} on chromosome 1. This polymorphism should also be useful for linkage studies in populations and in families. Our analysis also demonstrates that functionally significant coding sequence variants of the 5-HT{sub 1D{alpha}} are probably not abundant either among alcoholics or in the general population. 14 refs., 1 fig., 1 tab.« less
SCALCE: boosting sequence compression algorithms using locally consistent encoding

PubMed Central

Hach, Faraz; Numanagić, Ibrahim; Sahinalp, S Cenk

2012-01-01

Motivation: The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for the computational infrastructure. Data management, storage and analysis have become major logistical obstacles for those adopting the new platforms. The requirement for large investment for this purpose almost signalled the end of the Sequence Read Archive hosted at the National Center for Biotechnology Information (NCBI), which holds most of the sequence data generated world wide. Currently, most HTS data are compressed through general purpose algorithms such as gzip. These algorithms are not designed for compressing data generated by the HTS platforms; for example, they do not take advantage of the specific nature of genomic sequence data, that is, limited alphabet size and high similarity among reads. Fast and efficient compression algorithms designed specifically for HTS data should be able to address some of the issues in data management, storage and communication. Such algorithms would also help with analysis provided they offer additional capabilities such as random access to any read and indexing for efficient sequence similarity search. Here we present SCALCE, a ‘boosting’ scheme based on Locally Consistent Parsing technique, which reorganizes the reads in a way that results in a higher compression speed and compression rate, independent of the compression algorithm in use and without using a reference genome. Results: Our tests indicate that SCALCE can improve the compression rate achieved through gzip by a factor of 4.19—when the goal is to compress the reads alone. In fact, on SCALCE reordered reads, gzip running time can improve by a factor of 15.06 on a standard PC with a single core and 6 GB memory. Interestingly even the running time of SCALCE + gzip improves that of gzip alone by a factor of 2.09. When compared with the recently published BEETL, which aims to sort the (inverted) reads in lexicographic order for improving bzip2, SCALCE + gzip provides up to 2.01 times better compression while improving the running time by a factor of 5.17. SCALCE also provides the option to compress the quality scores as well as the read names, in addition to the reads themselves. This is achieved by compressing the quality scores through order-3 Arithmetic Coding (AC) and the read names through gzip through the reordering SCALCE provides on the reads. This way, in comparison with gzip compression of the unordered FASTQ files (including reads, read names and quality scores), SCALCE (together with gzip and arithmetic encoding) can provide up to 3.34 improvement in the compression rate and 1.26 improvement in running time. Availability: Our algorithm, SCALCE (Sequence Compression Algorithm using Locally Consistent Encoding), is implemented in C++ with both gzip and bzip2 compression options. It also supports multithreading when gzip option is selected, and the pigz binary is available. It is available at http://scalce.sourceforge.net. Contact: fhach@cs.sfu.ca or cenk@cs.sfu.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047557

Some links on this page may take you to non-federal websites. Their policies may differ from this site.