Sample records for sequence similarity values

  1. Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

    PubMed

    Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

    2014-02-01

    Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.

  2. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, T.G.; Chang, W.I.

    1997-12-23

    A method and apparatus are disclosed for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence. 5 figs.

  3. Method and apparatus for biological sequence comparison

    DOEpatents

    Marr, Thomas G.; Chang, William I-Wei

    1997-01-01

    A method and apparatus for comparing biological sequences from a known source of sequences, with a subject (query) sequence. The apparatus takes as input a set of target similarity levels (such as evolutionary distances in units of PAM), and finds all fragments of known sequences that are similar to the subject sequence at each target similarity level, and are long enough to be statistically significant. The invention device filters out fragments from the known sequences that are too short, or have a lower average similarity to the subject sequence than is required by each target similarity level. The subject sequence is then compared only to the remaining known sequences to find the best matches. The filtering member divides the subject sequence into overlapping blocks, each block being sufficiently large to contain a minimum-length alignment from a known sequence. For each block, the filter member compares the block with every possible short fragment in the known sequences and determines a best match for each comparison. The determined set of short fragment best matches for the block provide an upper threshold on alignment values. Regions of a certain length from the known sequences that have a mean alignment value upper threshold greater than a target unit score are concatenated to form a union. The current block is compared to the union and provides an indication of best local alignment with the subject sequence.

  4. Wideband Arrhythmia-Insensitive-Rapid (AIR) Pulse Sequence for Cardiac T1 mapping without Image Artifacts induced by ICD

    PubMed Central

    Hong, KyungPyo; Jeong, Eun-Kee; Wall, T. Scott; Drakos, Stavros G.; Kim, Daniel

    2015-01-01

    Purpose To develop and evaluate a wideband arrhythmia-insensitive-rapid (AIR) pulse sequence for cardiac T1 mapping without image artifacts induced by implantable-cardioverter-defibrillator (ICD). Methods We developed a wideband AIR pulse sequence by incorporating a saturation pulse with wide frequency bandwidth (8.9 kHz), in order to achieve uniform T1 weighting in the heart with ICD. We tested the performance of original and “wideband” AIR cardiac T1 mapping pulse sequences in phantom and human experiments at 1.5T. Results In 5 phantoms representing native myocardium and blood and post-contrast blood/tissue T1 values, compared with the control T1 values measured with an inversion-recovery pulse sequence without ICD, T1 values measured with original AIR with ICD were considerably lower (absolute percent error >29%), whereas T1 values measured with wideband AIR with ICD were similar (absolute percent error <5%). Similarly, in 11 human subjects, compared with the control T1 values measured with original AIR without ICD, T1 measured with original AIR with ICD was significantly lower (absolute percent error >10.1%), whereas T1 measured with wideband AIR with ICD was similar (absolute percent error <2.0%). Conclusion This study demonstrates the feasibility of a wideband pulse sequence for cardiac T1 mapping without significant image artifacts induced by ICD. PMID:25975192

  5. Pattern similarity study of functional sites in protein sequences: lysozymes and cystatins

    PubMed Central

    Nakai, Shuryo; Li-Chan, Eunice CY; Dou, Jinglie

    2005-01-01

    Background Although it is generally agreed that topography is more conserved than sequences, proteins sharing the same fold can have different functions, while there are protein families with low sequence similarity. An alternative method for profile analysis of characteristic conserved positions of the motifs within the 3D structures may be needed for functional annotation of protein sequences. Using the approach of quantitative structure-activity relationships (QSAR), we have proposed a new algorithm for postulating functional mechanisms on the basis of pattern similarity and average of property values of side-chains in segments within sequences. This approach was used to search for functional sites of proteins belonging to the lysozyme and cystatin families. Results Hydrophobicity and β-turn propensity of reference segments with 3–7 residues were used for the homology similarity search (HSS) for active sites. Hydrogen bonding was used as the side-chain property for searching the binding sites of lysozymes. The profiles of similarity constants and average values of these parameters as functions of their positions in the sequences could identify both active and substrate binding sites of the lysozyme of Streptomyces coelicolor, which has been reported as a new fold enzyme (Cellosyl). The same approach was successfully applied to cystatins, especially for postulating the mechanisms of amyloidosis of human cystatin C as well as human lysozyme. Conclusion Pattern similarity and average index values of structure-related properties of side chains in short segments of three residues or longer were, for the first time, successfully applied for predicting functional sites in sequences. This new approach may be applicable to studying functional sites in un-annotated proteins, for which complete 3D structures are not yet available. PMID:15904486

  6. Community detection in sequence similarity networks based on attribute clustering

    DOE PAGES

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    2017-07-24

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  7. Community detection in sequence similarity networks based on attribute clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  8. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  9. Investigating Correlation between Protein Sequence Similarity and Semantic Similarity Using Gene Ontology Annotations.

    PubMed

    Ikram, Najmul; Qadir, Muhammad Abdul; Afzal, Muhammad Tanvir

    2018-01-01

    Sequence similarity is a commonly used measure to compare proteins. With the increasing use of ontologies, semantic (function) similarity is getting importance. The correlation between these measures has been applied in the evaluation of new semantic similarity methods, and in protein function prediction. In this research, we investigate the relationship between the two similarity methods. The results suggest absence of a strong correlation between sequence and semantic similarities. There is a large number of proteins with low sequence similarity and high semantic similarity. We observe that Pearson's correlation coefficient is not sufficient to explain the nature of this relationship. Interestingly, the term semantic similarity values above 0 and below 1 do not seem to play a role in improving the correlation. That is, the correlation coefficient depends only on the number of common GO terms in proteins under comparison, and the semantic similarity measurement method does not influence it. Semantic similarity and sequence similarity have a distinct behavior. These findings are of significant effect for future works on protein comparison, and will help understand the semantic similarity between proteins in a better way.

  10. Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

    NASA Astrophysics Data System (ADS)

    Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

    2018-03-01

    It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.

  11. Several Families of Sequences with Low Correlation and Large Linear Span

    NASA Astrophysics Data System (ADS)

    Zeng, Fanxin; Zhang, Zhenyu

    In DS-CDMA systems and DS-UWB radios, low correlation of spreading sequences can greatly help to minimize multiple access interference (MAI) and large linear span of spreading sequences can reduce their predictability. In this letter, new sequence sets with low correlation and large linear span are proposed. Based on the construction Trm1[Trnm(αbt+γiαdt)]r for generating p-ary sequences of period pn-1, where n=2m, d=upm±v, b=u±v, γi∈GF(pn), and p is an arbitrary prime number, several methods to choose the parameter d are provided. The obtained sequences with family size pn are of four-valued, five-valued, six-valued or seven-valued correlation and the maximum nontrivial correlation value is (u+v-1)pm-1. The simulation by a computer shows that the linear span of the new sequences is larger than that of the sequences with Niho-type and Welch-type decimations, and similar to that of [10].

  12. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  13. The Development of Mental Models for Auditory Events: Relational Complexity and Discrimination of Pitch and Duration

    ERIC Educational Resources Information Center

    Stevens, Catherine; Gallagher, Melinda

    2004-01-01

    This experiment investigated relational complexity and relational shift in judgments of auditory patterns. Pitch and duration values were used to construct two-note perceptually similar sequences (unary relations) and four-note relationally similar sequences (binary relations). It was hypothesized that 5-, 8- and 11-year-old children would perform…

  14. Sequence analysis of Leukemia DNA

    NASA Astrophysics Data System (ADS)

    Nacong, Nasria; Lusiyanti, Desy; Irawan, Muhammad. Isa

    2018-03-01

    Cancer is a very deadly disease, one of which is leukemia disease or better known as blood cancer. The cancer cell can be detected by taking DNA in laboratory test. This study focused on local alignment of leukemia and non leukemia data resulting from NCBI in the form of DNA sequences by using Smith-Waterman algorithm. SmithWaterman algorithm was invented by TF Smith and MS Waterman in 1981. These algorithms try to find as much as possible similarity of a pair of sequences, by giving a negative value to the unequal base pair (mismatch), and positive values on the same base pair (match). So that will obtain the maximum positive value as the end of the alignment, and the minimum value as the initial alignment. This study will use sequences of leukemia and 3 sequences of non leukemia.

  15. The property distance index PD predicts peptides that cross-react with IgE antibodies

    PubMed Central

    Ivanciuc, Ovidiu; Midoro-Horiuti, Terumi; Schein, Catherine H.; Xie, Liping; Hillman, Gilbert R.; Goldblum, Randall M.; Braun, Werner

    2009-01-01

    Similarities in the sequence and structure of allergens can explain clinically observed cross-reactivities. Distinguishing sequences that bind IgE in patient sera can be used to identify potentially allergenic protein sequences and aid in the design of hypo-allergenic proteins. The property distance index PD, incorporated in our Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/), may identify potentially cross-reactive segments of proteins, based on their similarity to known IgE epitopes. We sought to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to three linear IgE epitopes of Jun a 1, the dominant allergen from mountain cedar pollen. For each of the three epitopes, 60 peptides were designed with increasing PD values (decreasing physicochemical similarity) to the starting sequence. The peptides synthesized on a derivatized cellulose membrane were probed with sera from patients who were allergic to Jun a 1, and the experimental data were interpreted with a PD classification method. Peptides with low PD values relative to a given epitope were more likely to bind IgE from the sera than were those with PD values larger than 6. Control sequences, with PD values between 18 and 20 to all the three epitopes, did not bind patient IgE, thus validating our procedure for identifying negative control peptides. The PD index is a statistically validated method to detect discrete regions of proteins that have a high probability of cross-reacting with IgE from allergic patients. PMID:18950868

  16. BLAST and FASTA similarity searching for multiple sequence alignment.

    PubMed

    Pearson, William R

    2014-01-01

    BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.

  17. K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

    PubMed

    Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

    2018-05-15

    Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.

  18. Comparative sequence analyses of sixteen reptilian paramyxoviruses

    USGS Publications Warehouse

    Ahne, W.; Batts, W.N.; Kurath, G.; Winton, J.R.

    1999-01-01

    Viral genomic RNA of Fer-de-Lance virus (FDLV), a paramyxovirus highly pathogenic for reptiles, was reverse transcribed and cloned. Plasmids with significant sequence similarities to the hemagglutinin-neuraminidase (HN) and polymerase (L) genes of mammalian paramyxoviruses were identified by BLAST search. Partial sequences of the FDLV genes were used to design primers for amplification by nested polymerase chain reaction (PCR) and sequencing of 518-bp L gene and 352-bp HN gene fragments from a collection of 15 previously uncharacterized reptilian paramyxoviruses. Phylogenetic analyses of the partial L and HN sequences produced similar trees in which there were two distinct subgroups of isolates that were supported with maximum bootstrap values, and several intermediate isolates. Within each subgroup the nucleotide divergence values were less than 2.5%, while the divergence between the two subgroups was 20-22%. This indicated that the two subgroups represent distinct virus species containing multiple virus strains. The five intermediate isolates had nucleotide divergence values of 11-20% and may represent additional distinct species. In addition to establishing diversity among reptilian paramyxoviruses, the phylogenetic groupings showed some correlation with geographic location, and clearly demonstrated a low level of host species-specificity within these viruses. Copyright (C) 1999 Elsevier Science B.V.

  19. Model-free aftershock forecasts constructed from similar sequences in the past

    NASA Astrophysics Data System (ADS)

    van der Elst, N.; Page, M. T.

    2017-12-01

    The basic premise behind aftershock forecasting is that sequences in the future will be similar to those in the past. Forecast models typically use empirically tuned parametric distributions to approximate past sequences, and project those distributions into the future to make a forecast. While parametric models do a good job of describing average outcomes, they are not explicitly designed to capture the full range of variability between sequences, and can suffer from over-tuning of the parameters. In particular, parametric forecasts may produce a high rate of "surprises" - sequences that land outside the forecast range. Here we present a non-parametric forecast method that cuts out the parametric "middleman" between training data and forecast. The method is based on finding past sequences that are similar to the target sequence, and evaluating their outcomes. We quantify similarity as the Poisson probability that the observed event count in a past sequence reflects the same underlying intensity as the observed event count in the target sequence. Event counts are defined in terms of differential magnitude relative to the mainshock. The forecast is then constructed from the distribution of past sequences outcomes, weighted by their similarity. We compare the similarity forecast with the Reasenberg and Jones (RJ95) method, for a set of 2807 global aftershock sequences of M≥6 mainshocks. We implement a sequence-specific RJ95 forecast using a global average prior and Bayesian updating, but do not propagate epistemic uncertainty. The RJ95 forecast is somewhat more precise than the similarity forecast: 90% of observed sequences fall within a factor of two of the median RJ95 forecast value, whereas the fraction is 85% for the similarity forecast. However, the surprise rate is much higher for the RJ95 forecast; 10% of observed sequences fall in the upper 2.5% of the (Poissonian) forecast range. The surprise rate is less than 3% for the similarity forecast. The similarity forecast may be useful to emergency managers and non-specialists when confidence or expertise in parametric forecasting may be lacking. The method makes over-tuning impossible, and minimizes the rate of surprises. At the least, this forecast constitutes a useful benchmark for more precisely tuned parametric forecasts.

  20. Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

    PubMed

    Fröhlich, K U

    1994-04-01

    A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.

  1. Taxonomic evaluation of Streptomyces hirsutus and related species using multi-locus sequence analysis

    USDA-ARS?s Scientific Manuscript database

    Phylogenetic analyses of species of Streptomyces based on 16S rRNA gene sequences resulted in a statistically well-supported clade (100% bootstrap value) containing 8 species having very similar gross morphology. These species, including Streptomyces bambergiensis, Streptomyces chlorus, Streptomyces...

  2. Entropic fluctuations in DNA sequences

    NASA Astrophysics Data System (ADS)

    Thanos, Dimitrios; Li, Wentian; Provata, Astero

    2018-03-01

    The Local Shannon Entropy (LSE) in blocks is used as a complexity measure to study the information fluctuations along DNA sequences. The LSE of a DNA block maps the local base arrangement information to a single numerical value. It is shown that despite this reduction of information, LSE allows to extract meaningful information related to the detection of repetitive sequences in whole chromosomes and is useful in finding evolutionary differences between organisms. More specifically, large regions of tandem repeats, such as centromeres, can be detected based on their low LSE fluctuations along the chromosome. Furthermore, an empirical investigation of the appropriate block sizes is provided and the relationship of LSE properties with the structure of the underlying repetitive units is revealed by using both computational and mathematical methods. Sequence similarity between the genomic DNA of closely related species also leads to similar LSE values at the orthologous regions. As an application, the LSE covariance function is used to measure the evolutionary distance between several primate genomes.

  3. OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

    PubMed

    Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

    2016-02-01

    Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

  4. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  5. Aftershock occurrence rate decay for individual sequences and catalogs

    NASA Astrophysics Data System (ADS)

    Nyffenegger, Paul A.

    One of the earliest observations of the Earth's seismicity is that the rate of aftershock occurrence decays with time according to a power law commonly known as modified Omori-law (MOL) decay. However, the physical reasons for aftershock occurrence and the empirical decay in rate remain unclear despite numerous models that yield similar rate decay behavior. Key problems in relating the observed empirical relationship to the physical conditions of the mainshock and fault are the lack of studies including small magnitude mainshocks and the lack of uniformity between studies. We use simulated aftershock sequences to investigate the factors which influence the maximum likelihood (ML) estimate of the Omori-law p value, the parameter describing aftershock occurrence rate decay, for both individual aftershock sequences and "stacked" or superposed sequences. Generally the ML estimate of p is accurate, but since the ML estimated uncertainty is unaffected by whether the sequence resembles an MOL model, a goodness-of-fit test such as the Anderson-Darling statistic is necessary. While stacking aftershock sequences permits the study of entire catalogs and sequences with small aftershock populations, stacking introduces artifacts. The p value for stacked sequences is approximately equal to the mean of the individual sequence p values. We apply single-link cluster analysis to identify all aftershock sequences from eleven regional seismicity catalogs. We observe two new mathematically predictable empirical relationships for the distribution of aftershock sequence populations. The average properties of aftershock sequences are not correlated with tectonic environment, but aftershock populations and p values do show a depth dependence. The p values show great variability with time, and large values or changes in p sometimes precedes major earthquakes. Studies of teleseismic earthquake catalogs over the last twenty years have led seismologists to question seismicity models and aftershock sequence decay for deep sequences. For seven exceptional deep sequences, we conclude that MOL decay adequately describes these sequences, and little difference exists compared to shallow sequences. However, they do include larger aftershock populations compared to most deep sequences. These results imply that p values for deep sequences are larger than those for intermediate depth sequences.

  6. 123s and ABCs: developmental shifts in logarithmic-to-linear responding reflect fluency with sequence values.

    PubMed

    Hurst, Michelle; Monahan, K Leigh; Heller, Elizabeth; Cordes, Sara

    2014-11-01

    When placing numbers along a number line with endpoints 0 and 1000, children generally space numbers logarithmically until around the age of 7, when they shift to a predominantly linear pattern of responding. This developmental shift of responding on the number placement task has been argued to be indicative of a shift in the format of the underlying representation of number (Siegler & Opfer, ). In the current study, we provide evidence from both child and adult participants to suggest that performance on the number placement task may not reflect the structure of the mental number line, but instead is a function of the fluency (i.e. ease) with which the individual can work with the values in the sequence. In Experiment 1, adult participants respond logarithmically when placing numbers on a line with less familiar anchors (1639 to 2897), despite linear responding on control tasks with standard anchors involving a similar range (0 to 1287) and a similar numerical magnitude (2000 to 3000). In Experiment 2, we show a similar developmental shift in childhood from logarithmic to linear responding for a non-numerical sequence with no inherent magnitude (the alphabet). In conclusion, we argue that the developmental trend towards linear behavior on the number line task is a product of successful strategy use and mental fluency with the values of the sequence, resulting from familiarity with endpoints and increased knowledge about general ordering principles of the sequence.A video abstract of this article can be viewed at:http://www.youtube.com/watch?v=zg5Q2LIFk3M. © 2014 John Wiley & Sons Ltd.

  7. Computationally predicted IgE epitopes of walnut allergens contribute to cross-reactivity with peanuts

    PubMed Central

    Maleki, Soheila J.; Teuber, Suzanne S.; Cheng, Hsiaopo; Chen, Deliang; Comstock, Sarah S.; Ruan, Sanbao; Schein, Catherine H.

    2011-01-01

    Background Cross reactivity between peanuts and tree nuts implies that similar IgE epitopes are present in their proteins. Objective To determine whether walnut sequences similar to known peanut IgE binding sequences, according to the property distance (PD) scale implemented in the Structural Database of Allergenic Proteins (SDAP), react with IgE from sera of patients with allergy to walnut and/or peanut. Methods Patient sera were characterized by Western blotting for IgE-binding to nut protein extracts, and to peptides from walnut and peanut allergens, similar to known peanut epitopes as defined by low PD values, synthesized on membranes. Competitive ELISA was used to show that peanut and predicted walnut epitope sequences compete with purified Ara h 2 for binding to IgE in serum from a cross-reactive patient. Results Sequences from the vicilin walnut allergen Jug r 2 which had low PD values to epitopes of the peanut allergen Ara h 2, a 2s-albumin, bound IgE in sera from five patients who reacted to either walnut, peanut or both. A walnut epitope recognized by 6 patients mapped to a surface-exposed region on a model of the N-terminal pro-region of Jug r 2. A predicted walnut epitope competed for IgE binding to Ara h 2 in serum as well as the known IgE epitope from Ara h 2. Conclusions Sequences with low PD value (<8.5) to known IgE epitopes could contribute to cross-reactivity between allergens. This further validates the PD scoring method for predicting cross-reactive epitopes in allergens. PMID:21883278

  8. A statistical physics perspective on alignment-independent protein sequence comparison.

    PubMed

    Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

    2015-08-01

    Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

  9. The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

    PubMed

    Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

    2015-11-04

    In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.

  10. Grammatical complexity for two-dimensional maps

    NASA Astrophysics Data System (ADS)

    Hagiwara, Ryouichi; Shudo, Akira

    2004-11-01

    We calculate the grammatical complexity of the symbol sequences generated from the Hénon map and the Lozi map using the recently developed methods to construct the pruning front. When the map is hyperbolic, the language of symbol sequences is regular in the sense of the Chomsky hierarchy and the corresponding grammatical complexity takes finite values. It is found that the complexity exhibits a self-similar structure as a function of the system parameter, and the similarity of the pruning fronts is discussed as an origin of such self-similarity. For non-hyperbolic cases, it is observed that the complexity monotonically increases as we increase the resolution of the pruning front.

  11. Sequentially distant but structurally similar proteins exhibit fold specific patterns based on their biophysical properties.

    PubMed

    Rajendran, Senthilnathan; Jothi, Arunachalam

    2018-05-16

    The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Taxonomic evaluation of species in the Streptomyces hirsutus clade using multi-locus sequence analysis and proposals to reclassify several species in this clade

    USDA-ARS?s Scientific Manuscript database

    Previous phylogenetic analyses of species of Streptomyces based on 16S rRNA gene sequences resulted in a statistically well-supported clade (100% bootstrap value) containing 8 species that exhibited very similar gross morphology in producing open looped (Retinaculum-Apertum) to spiral (Spira) chains...

  13. Lactobacillus cypricasei Lawson et al. 2001 is a later heterotypic synonym of Lactobacillus acidipiscis Tanasupawat et al. 2000.

    PubMed

    Naser, Sabri M; Vancanneyt, Marc; Hoste, Bart; Snauwaert, Cindy; Swings, Jean

    2006-07-01

    The applicability of a multilocus sequence analysis (MLSA)-based identification system for lactobacilli was evaluated. Two housekeeping genes that code for the phenylalanyl-tRNA synthase alpha-subunit (pheS) and RNA polymerase alpha-subunit (rpoA) were sequenced and analysed for members of the Lactobacillus salivarius species group. The type strains of Lactobacillus acidipiscis and Lactobacillus cypricasei were investigated further using a third gene that encodes the alpha-subunit of ATP synthase (atpA). The MLSA data revealed close relatedness between L. acidipiscis and L. cypricasei, with 99.8-100 % pheS, rpoA and atpA gene sequence similarities. Comparison of the 16S rRNA gene sequences of the type strains of the two species confirmed the close relatedness (99.8 % gene sequence similarity) between the two taxa. Similar phenotypes and high DNA-DNA binding values in the range of 84 to 97.5 % confirmed that L. acidipiscis and L. cypricasei are synonymous species. On the basis of the present study, it is proposed that Lactobacillus cypricasei is a later heterotypic synonym of Lactobacillus acidipiscis.

  14. rpoB Gene Sequencing for Identification of Corynebacterium Species

    PubMed Central

    Khamis, Atieh; Raoult, Didier; La Scola, Bernard

    2004-01-01

    The genus Corynebacterium is a heterogeneous group of species comprising human and animal pathogens and environmental bacteria. It is defined on the basis of several phenotypic characters and the results of DNA-DNA relatedness and, more recently, 16S rRNA gene sequencing. However, the 16S rRNA gene is not polymorphic enough to ensure reliable phylogenetic studies and needs to be completely sequenced for accurate identification. The almost complete rpoB sequences of 56 Corynebacterium species were determined by both PCR and genome walking methods. In all cases the percent similarities between different species were lower than those observed by 16S rRNA gene sequencing, even for those species with degrees of high similarity. Several clusters supported by high bootstrap values were identified. In order to propose a method for strain identification which does not require sequencing of the complete rpoB sequence (approximately 3,500 bp), we identified an area with a high degree of polymorphism, bordered by conserved sequences that can be used as universal primers for PCR amplification and sequencing. The sequence of this fragment (434 to 452 bp) allows accurate species identification and may be used in the future for routine sequence-based identification of Corynebacterium species. PMID:15364970

  15. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

    PubMed

    Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M

    2007-01-01

    DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.

  16. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  17. Exploring the sequence-structure protein landscape in the glycosyltransferase family

    PubMed Central

    Zhang, Ziding; Kochhar, Sunil; Grigorov, Martin

    2003-01-01

    To understand the molecular basis of glycosyltransferases’ (GTFs) catalytic mechanism, extensive structural information is required. Here, fold recognition methods were employed to assign 3D protein shapes (folds) to the currently known GTF sequences, available in public databases such as GenBank and Swissprot. First, GTF sequences were retrieved and classified into clusters, based on sequence similarity only. Intracluster sequence similarity was chosen sufficiently high to ensure that the same fold is found within a given cluster. Then, a representative sequence from each cluster was selected to compose a subset of GTF sequences. The members of this reduced set were processed by three different fold recognition methods: 3D-PSSM, FUGUE, and GeneFold. Finally, the results from different fold recognition methods were analyzed and compared to sequence-similarity search methods (i.e., BLAST and PSI-BLAST). It was established that the folds of about 70% of all currently known GTF sequences can be confidently assigned by fold recognition methods, a value which is higher than the fold identification rate based on sequence comparison alone (48% for BLAST and 64% for PSI-BLAST). The identified folds were submitted to 3D clustering, and we found that most of the GTF sequences adopt the typical GTF A or GTF B folds. Our results indicate a lack of evidence that new GTF folds (i.e., folds other than GTF A and B) exist. Based on cases where fold identification was not possible, we suggest several sequences as the most promising targets for a structural genomics initiative focused on the GTF protein family. PMID:14500887

  18. Genetic diversity of merozoite surface antigens in Babesia bovis detected from Sri Lankan cattle.

    PubMed

    Sivakumar, Thillaiampalam; Okubo, Kazuhiro; Igarashi, Ikuo; de Silva, Weligodage Kumarawansa; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Vimalakumar, Singarayar Caniciyas; Meewewa, Asela Sanjeewa; Yokoyama, Naoaki

    2013-10-01

    Babesia bovis, the causative agent of severe bovine babesiosis, is endemic in Sri Lanka. The live attenuated vaccine (K-strain), which was introduced in the early 1990s, has been used to immunize cattle populations in endemic areas of the country. The present study was undertaken to determine the genetic diversity of merozoite surface antigens (MSAs) in B. bovis isolates from Sri Lankan cattle, and to compare the gene sequences obtained from such isolates against those of the K-strain. Forty-four bovine blood samples isolated from different geographical regions of Sri Lanka and judged to be B. bovis-positive by PCR screening were used to amplify MSAs (MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b), AMA-1, and 12D3 genes from parasite DNA. Although the AMA-1 and 12D3 gene sequences were highly conserved among the Sri Lankan isolates, the MSA gene sequences from the same isolates were highly diverse. Sri Lankan MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b sequences clustered within 5, 2, 4, 1, and 9 different clades in the gene phylograms, respectively, while the minimum similarity values among the deduced amino acid sequences of these genes were 36.8%, 68.7%, 80.3%, 100%, and 68.3%, respectively. In the phylograms, none of the Sri Lankan sequences fell within clades containing the respective K-strain sequences. Additionally, the similarity values for MSA-1 and MSA-2c were 40-61.8% and 90.9-93.2% between the Sri Lankan isolates and the K-strain, respectively, while the K-strain MSA-2a/b sequence shared 64.5-69.8%, 69.3%, and 70.5-80.3% similarities with the Sri Lankan MSA-2a1, MSA-2a2, and MSA-2b sequences, respectively. The present study has shown that genetic diversity among MSAs of Sri Lankan B. bovis isolates is very high, and that the sequences of field isolates diverged genetically from the K-strain. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    DOE PAGES

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less

  20. Late Quaternary climate and environmental reconstruction based on leaf wax analyses in the loess sequence of Möhlin, Switzerland

    NASA Astrophysics Data System (ADS)

    Wüthrich, Lorenz; Bliedtner, Marcel; Kathrin Schäfer, Imke; Zech, Jana; Shajari, Fatemeh; Gaar, Dorian; Preusser, Frank; Salazar, Gary; Szidat, Sönke; Zech, Roland

    2017-12-01

    We present the results of leaf wax analyses (long-chain n-alkanes) from the 6.8 m deep loess sequence of Möhlin, Switzerland, spanning the last ˜ 70 kyr. Leaf waxes are well preserved and occur in sufficient amounts only down to 0.4 m and below 1.8 m depth, so no paleoenvironmental reconstructions can be done for marine isotope stage (MIS) 2. Compound-specific δ2Hwax analyses yielded similar values for late MIS 3 compared to the uppermost samples, indicating that various effects (e.g., more negative values due to lower temperatures, more positive values due to an enriched moisture source) cancel each other out. A pronounced ˜ 30 ‰ shift towards more negative values probably reflects more humid conditions before ˜ 32 ka. Radiocarbon dating of the n-alkanes corroborates the stratigraphic integrity of leaf waxes and their potential for dating loess-paleosol sequences (LPS) back to ˜ 30 ka.

  1. GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.

    PubMed

    Xu, Nuo; Fu, Dongfang; Li, Shiang; Wang, Yuxuan; Wong, Aloysius

    2018-06-15

    GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. The GCPred server is available at http://gcpred.com. Supplementary data are available at Bioinformatics online.

  2. A mapping of an ensemble of mitochondrial sequences for various organisms into 3D space based on the word composition.

    PubMed

    Aita, Takuyo; Nishigaki, Koichi

    2012-11-01

    To visualize a bird's-eye view of an ensemble of mitochondrial genome sequences for various species, we recently developed a novel method of mapping a biological sequence ensemble into Three-Dimensional (3D) vector space. First, we represented a biological sequence of a species s by a word-composition vector x(s), where its length [absolute value]x(s)[absolute value] represents the sequence length, and its unit vector x(s)/[absolute value]x(s)[absolute value] represents the relative composition of the K-tuple words through the sequence and the size of the dimension, N=4(K), is the number of all possible words with the length of K. Second, we mapped the vector x(s) to the 3D position vector y(s), based on the two following simple principles: (1) [absolute value]y(s)[absolute value]=[absolute value]x(s)[absolute value] and (2) the angle between y(s) and y(t) maximally correlates with the angle between x(s) and x(t). The mitochondrial genome sequences for 311 species, including 177 Animalia, 85 Fungi and 49 Green plants, were mapped into 3D space by using K=7. The mapping was successful because the angles between vectors before and after the mapping highly correlated with each other (correlation coefficients were 0.92-0.97). Interestingly, the Animalia kingdom is distributed along a single arc belt (just like the Milky Way on a Celestial Globe), and the Fungi and Green plant kingdoms are distributed in a similar arc belt. These two arc belts intersect at their respective middle regions and form a cross structure just like a jet aircraft fuselage and its wings. This new mapping method will allow researchers to intuitively interpret the visual information presented in the maps in a highly effective manner. Copyright © 2012 Elsevier Inc. All rights reserved.

  3. Identification and characterization of dinucleotide repeat (CA)[sub n] markers for genetic mapping in dog

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ostrander, E.A.; Sprague, G.F. Jr.; Rine, J.

    1993-04-01

    A large block of simple sequence repeat (SSR) polymorphisms for the dog genome has been isolated and characterized. Screening of primary libraries by conventional hybridization methods as well as by screening of enriched marker-selected libraries led to the isolation of a large number of genomic clones that contained (CA)[sub n] repeats. The sequences of 101 clones showed that the size and complexity of (CA)[sub n] repeats in the dog genome were similar to those reported for these markers in the human genome. Detailed analysis of a representative subset of these markers revealed that most markers were moderately to highly polymorphic,more » with PIC values exceeding 0.70 for 33% of the markers tested. An association between higher PIC values and markers containing longer (CA)[sub n] repeats was observed in these studies, as previously noted for similar markers in the human genome. A list of primer sequences that tag each characterized marker is provided, and a comprehensive system of nomenclature for the dog genome is suggested. 28 refs., 4 figs., 2 tabs.« less

  4. HYBRIDIZATION PROPERTIES OF DNA SEQUENCES DIRECTING THE SYNTHESIS OF MESSENGER RNA AND HETEROGENEOUS NUCLEAR RNA

    PubMed Central

    Greenberg, Jay R.; Perry, Robert P.

    1971-01-01

    The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767

  5. Conservative secondary structure motifs already present in early-stage folding (in silico) as found in serpines family.

    PubMed

    Brylinski, Michal; Konieczny, Leszek; Kononowicz, Andrzej; Roterman, Irena

    2008-03-21

    The well-known procedure implemented in ClustalW oriented on the sequence comparison was applied to structure comparison. The consensus sequence as well as consensus structure has been defined for proteins belonging to serpine family. The structure of early stage intermediate was the object for similarity search. The high values of W(sequence) appeared to be accordant with high values of W(structure) making possible structure comparison using common criteria for sequence and structure comparison. Since the early stage structural form has been created according to limited conformational sub-space which does not include the beta-structure (this structure is mediated by C7eq structural form), is particularly important to see, that the C7eq structural form may be treated as the seed for beta-structure present in the final native structure of protein. The applicability of ClustalW procedure to structure comparison makes these two comparisons unified.

  6. Compositional segmentation and complexity measurement in stock indices

    NASA Astrophysics Data System (ADS)

    Wang, Haifeng; Shang, Pengjian; Xia, Jianan

    2016-01-01

    In this paper, we introduce a complexity measure based on the entropic segmentation called sequence compositional complexity (SCC) into the analysis of financial time series. SCC was first used to deal directly with the complex heterogeneity in nonstationary DNA sequences. We already know that SCC was found to be higher in sequences with long-range correlation than those with low long-range correlation, especially in the DNA sequences. Now, we introduce this method into financial index data, subsequently, we find that the values of SCC of some mature stock indices, such as S & P 500 (simplified with S & P in the following) and HSI, are likely to be lower than the SCC value of Chinese index data (such as SSE). What is more, we find that, if we classify the indices with the method of SCC, the financial market of Hong Kong has more similarities with mature foreign markets than Chinese ones. So we believe that a good correspondence is found between the SCC of the index sequence and the complexity of the market involved.

  7. Streptococcus azizii sp. nov., isolated from naïve weanling mice.

    PubMed

    Shewmaker, Patricia Lynn; Whitney, Anne M; Gulvik, Christopher A; Lipman, Neil S

    2017-12-01

    Three isolates of a previously reported novel catalase-negative, Gram-stain-positive, coccoid, alpha-haemolytic, Streptococcus species that were associated with meningoencephalitis in naïve weanling mice were further evaluated to confirm their taxonomic status and to determine additional phenotypic and molecular characteristics. Comparative 16S rRNA gene sequence analysis showed nearly identical intra-species sequence similarity (≥99.9 %), and revealed the closest phylogenetically related species, Streptococcus acidominimus and Streptococcuscuniculi, with 97.0 and 97.5 % sequence similarity, respectively. The rpoB, sodA and recN genes were identical for the three isolates and were 87.6, 85.7 and 82.5 % similar to S. acidominimus and 89.7, 86.2 and 80.7 % similar to S. cuniculi, respectively. In silico DNA-DNA hybridization analyses of mouse isolate 12-5202 T against S. acidominimus CCUG 27296 T and S. cuniculi CCUG 65085 T produced estimated values of 26.4 and 25.7 % relatedness, and the calculated average nucleotide identity values were 81.9 and 81.7, respectively. These data confirm the taxonomic status of 12-5202 T as a distinct Streptococcus species, and we formally propose the type strain, Streptococcusazizii 12-5202 T (=CCUG 69378 T =DSM 103678 T ). The genome of Streptococcus azizii sp. nov. 12-5202 T contains 2062 total genes with a size of 2.34 Mbp, and an average G+C content of 42.76 mol%.

  8. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.

    PubMed

    Lun, Aaron T L; Bach, Karsten; Marioni, John C

    2016-04-27

    Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero. We present a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Our deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.

  9. Analysis of composition-based metagenomic classification.

    PubMed

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in contrast to our expectations, the variation of the measures did not change the configuration scores significantly. Finally, the hierarchical strategy was more effective than the conventional strategy, which suggests that, instead of using a single classifier, one should adopt multiple classifiers organized as a hierarchy.

  10. Characteristic motifs for families of allergenic proteins

    PubMed Central

    Ivanciuc, Ovidiu; Garcia, Tzintzuni; Torres, Miguel; Schein, Catherine H.; Braun, Werner

    2008-01-01

    The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver Motif-Mate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins. PMID:18951633

  11. Reappraisal of the taxonomy of Streptococcus suis serotypes 20, 22 and 26: Streptococcus parasuis sp. nov.

    PubMed

    Nomoto, R; Maruyama, F; Ishida, S; Tohya, M; Sekizaki, T; Osawa, Ro

    2015-02-01

    In order to clarify the taxonomic position of serotypes 20, 22 and 26 of Streptococcus suis, biochemical and molecular genetic studies were performed on isolates (SUT-7, SUT-286(T), SUT-319, SUT-328 and SUT-380) reacted with specific antisera of serotypes 20, 22 or 26 from the saliva of healthy pigs as well as reference strains of serotypes 20, 22 and 26. Comparative recN gene sequencing showed high genetic relatedness among our isolates, but marked differences from the type strain S. suis NCTC 10234(T), i.e. 74.8-75.7 % sequence similarity. The genomic relatedness between the isolates and other strains of species of the genus Streptococcus, including S. suis, was calculated using the average nucleotide identity values of whole genome sequences, which indicated that serotypes 20, 22 and 26 should be removed taxonomically from S. suis and treated as a novel genomic species. Comparative sequence analysis revealed 99.0-100 % sequence similarities for the 16S rRNA genes between the reference strains of serotypes 20, 22 and 26, and our isolates. Isolate STU-286(T) had relatively high 16S rRNA gene sequence similarity with S. suis NCTC 10234(T) (98.8 %). SUT-286(T) could be distinguished from S. suis and other closely related species of the genus Streptococcus using biochemical tests. Due to its phylogenetic and phenotypic similarities to S. suis we propose naming the novel species Streptococcus parasuis sp. nov., with SUT-286(T) ( = JCM 30273(T) = DSM 29126(T)) as the type strain. © 2015 IUMS.

  12. Genetic diversity in Trypanosoma theileri from Sri Lankan cattle and water buffaloes.

    PubMed

    Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Fukushi, Shintaro; Tattiyapong, Muncharee; Tuvshintulga, Bumduuren; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Igarashi, Ikuo; Inoue, Noboru

    2015-01-30

    Trypanosoma theileri is a hemoprotozoan parasite that infects various ruminant species. We investigated the epidemiology of this parasite among cattle and water buffalo populations bred in Sri Lanka, using a diagnostic PCR assay based on the cathepsin L-like protein (CATL) gene. Blood DNA samples sourced from cattle (n=316) and water buffaloes (n=320) bred in different geographical areas of Sri Lanka were PCR screened for T. theileri. Parasite DNA was detected in cattle and water buffaloes alike in all the sampling locations. The overall T. theileri-positive rate was higher in water buffaloes (15.9%) than in cattle (7.6%). Subsequently, PCR amplicons were sequenced and the partial CATL sequences were phylogenetically analyzed. The identity values for the CATL gene were 89.6-99.7% among the cattle-derived sequences, compared with values of 90.7-100% for the buffalo-derived sequences. However, the cattle-derived sequences shared 88.2-100% identity values with those from buffaloes. In the phylogenetic tree, the Sri Lankan CATL gene sequences fell into two major clades (TthI and TthII), both of which contain CATL sequences from several other countries. Although most of the CATL sequences from Sri Lankan cattle and buffaloes clustered independently, two buffalo-derived sequences were observed to be closely related to those of the Sri Lankan cattle. Furthermore, a Sri Lankan buffalo sequence clustered with CATL gene sequences from Brazilian buffalo and Thai cattle. In addition to reporting the first PCR-based survey of T. theileri among Sri Lankan-bred cattle and water buffaloes, the present study found that some of the CATL gene fragments sourced from water buffaloes shared similarity with those determined from cattle in this country. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Molecular tools to track bacteria responsible for fuel deterioration and microbiologically influenced corrosion.

    PubMed

    Suflita, Joseph M; Aktas, Deniz F; Oldham, Athenia L; Perez-Ibarra, Beatriz Monica; Duncan, Kathleen

    2012-01-01

    Investigating the susceptibility of various fuels to anaerobic biodegradation has become complicated with the recognition that the fuels themselves are not sterile. Bacterial DNA could be obtained when various fuels were filtered through a hydrophobic teflon (0.22 μm) membrane filter. Bacterial 16S rRNA genes from these preparations were PCR amplified, cloned, and the resulting libraries sequenced to identify the fuel-borne bacterial communities. The most common sequence, found in algal- and camelina-based biofuels as well as in ultra-low sulfur diesel (ULSD) and F76 diesel, was similar to that of a Tumebacillus. The next most common sequence was similar to Methylobacterium and was found in the biofuels and ULSD. Higher level phylogenetic groups included representatives of the Firmicutes (Bacillus, Lactobacillus and Streptococcus), several Actinobacteria, Deinococcus-Thermus, Chloroflexi, Cyanobacteria, Bacteroidetes, Alphaproteobacteria (Methylobacterium and Sphingomonadales), Betaproteobacteria (Oxalobacteraceae and Burkholderiales) and Deltaproteobacteria. All of the fuel-associated bacterial sequences, except those obtained from a few facultative microorganisms, were from aerobes and only remotely affiliated with sequences that resulted from anaerobic successional events evident when ULSD was incubated with a coastal seawater and sediment inoculum. Thus, both traditional and alternate fuel formulations harbor a characteristic microflora, but these microorganisms contributed little to the successional patterns that ultimately resulted in fuel decomposition, sulfide formation and metal biocorrosion. The findings illustrate the value of molecular approaches to track the fate of bacteria that might come in contact with fuels and potentially contribute to corrosion problems throughout the energy value chain.

  14. Gaetbulicola byunsanensis gen. nov., sp. nov., isolated from tidal flat sediment.

    PubMed

    Yoon, Jung-Hoon; Kang, So-Jung; Jung, Yong-Taek; Oh, Tae-Kwang

    2010-01-01

    A Gram-negative, non-motile and pleomorphic bacterial strain, SMK-114(T), which belongs to the class Alphaproteobacteria, was isolated from a tidal flat sample collected in Byunsan, Korea. Strain SMK-114(T) grew optimally at pH 7.0-8.0 and 25-30 degrees C and in the presence of 2 % (w/v) NaCl. A neighbour-joining phylogenetic tree based on 16S rRNA gene sequences showed that strain SMK-114(T) formed a cluster with Octadecabacter species, with which it exhibited 16S rRNA gene sequence similarity values of 95.2-95.4 %. This cluster was part of the clade comprising Thalassobius species with a bootstrap resampling value of 76.3 %. Strain SMK-114(T) exhibited 16S rRNA gene sequence similarity values of 95.1-96.3 % to members of the genus Thalassobius. It contained Q-10 as the predominant ubiquinone and C(18 : 1)omega7c as the major fatty acid. The DNA G+C content was 60.0 mol%. On the basis of phenotypic, chemotaxonomic and phylogenetic data, strain SMK-114(T) is considered to represent a novel species in a new genus for which the name Gaetbulicola byunsanensis gen. nov., sp. nov. is proposed. The type strain of Gaetbulicola byunsanensis is SMK-114(T) (=KCTC 22632(T) =CCUG 57612(T)).

  15. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms.

    PubMed

    Ortegon, Patricia; Poot-Hernández, Augusto C; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case.

  16. Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms

    PubMed Central

    Ortegon, Patricia; Poot-Hernández, Augusto C.; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya

    2015-01-01

    In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143

  17. Polynucleobacter bacteria in the brackish-water species Euplotes harpa (Ciliata Hypotrichia).

    PubMed

    Vannini, Claudia; Petroni, Giulio; Verni, Franco; Rosati, Giovanna

    2005-01-01

    We have found a Polynucleobacter bacterium in the cytoplasm of Euplotes harpa, a species living in a brackish-water habitat, with a cirral pattern not corresponding to that of the freshwater Euplotes species known to harbor this type of bacteria. The symbiont has been found in three strains of the species, obtained by clonal cultures from ciliates collected in different geographic regions. The 16S rRNA gene sequence of this bacterium identifies it as a member of the beta-proteobacterial genus Polynucleobacter. This sequence shares a high similarity value (98.4-98.5%) with P. necessarius, the type species of the genus, and is associated with 16S rRNA gene sequences of environmental clones and bacterial strains included in the Polynucleobacter cluster (>95%). An oligonucleotide probe was designed to corroborate the assignment of the retrieved sequence to the symbiont and to detect similar bacteria rapidly. Antibiotic experiments showed that the elimination of the bacteria stops the reproductive cycle in E. harpa, as has been shown for the freshwater Euplotes species.

  18. REPPER—repeats and their periodicities in fibrous proteins

    PubMed Central

    Gruber, Markus; Söding, Johannes; Lupas, Andrei N.

    2005-01-01

    REPPER (REPeats and their PERiodicities) is an integrated server that detects and analyzes regions with short gapless repeats in protein sequences or alignments. It finds periodicities by Fourier Transform (FTwin) and internal similarity analysis (REPwin). FTwin assigns numerical values to amino acids that reflect certain properties, for instance hydrophobicity, and gives information on corresponding periodicities. REPwin uses self-alignments and displays repeats that reveal significant internal similarities. Both programs use a sliding window to ensure that different periodic regions within the same protein are detected independently. FTwin and REPwin are complemented by secondary structure prediction (PSIPRED) and coiled coil prediction (COILS), making the server a versatile analysis tool for sequences of fibrous proteins. REPPER is available at . PMID:15980460

  19. Biochemical characterization and molecular evidence of a laccase from the bird's nest fungus Cyathus bulleri.

    PubMed

    Vasdev, Kavita; Dhawan, Shikha; Kapoor, Rajeev Kumar; Kuhad, Ramesh Chander

    2005-08-01

    Cyathus bulleri, a bird's nest fungus, known to decolorize polymeric dye Poly R-478, was found to produce 8 U ml(-1) of laccase in malt extract broth. Laccase activity appeared as a single band on non-denaturing gel. Laccase was purified to homogeneity by anion exchange chromatography and gel filtration. The enzyme was a monomer with an apparent molecular mass of 60 kD, pI of 3.7 and was stable in the pH range of 2-6 with an optimum pH of 5.2. The optimal reaction temperature was 45 degrees C and the enzyme lost its activity above 70 degrees C. Enzyme could oxidize a broad range of various phenolic substrates. K(m) values for ABTS, 2,6-dimethoxyphenol, guaiacol, and ferulic acid were found to be 48.6, 56, 22, and 14 mM while K(cat) values were 204, 180, 95.6, and 5.2, respectively. It was completely inhibited by KCN, NaN(3), beta-mercaptoethanol, HgCl(2), and SDS, while EDTA had no effect on enzyme activity. The N-terminal amino acid sequence of C. bulleri laccase showed close homology to N-terminal sequences of laccase from other white-rot fungi. A 150 bp gene sequence encoding copper-binding domains I and II was most similar to the sequence encoding a laccase from Pycnoporus cinnabarinus with 74.8% level of similarity.

  20. Impact of cultivation on characterisation of species composition of soil bacterial communities.

    PubMed

    McCaig, A E.; Grayston, S J.; Prosser, J I.; Glover, L A.

    2001-03-01

    The species composition of culturable bacteria in Scottish grassland soils was investigated using a combination of Biolog and 16S rDNA analysis for characterisation of isolates. The inclusion of a molecular approach allowed direct comparison of sequences from culturable bacteria with sequences obtained during analysis of DNA extracted directly from the same soil samples. Bacterial strains were isolated on Pseudomonas isolation agar (PIA), a selective medium, and on tryptone soya agar (TSA), a general laboratory medium. In total, 12 and 21 morphologically different bacterial cultures were isolated on PIA and TSA, respectively. Biolog and sequencing placed PIA isolates in the same taxonomic groups, the majority of cultures belonging to the Pseudomonas (sensu stricto) group. However, analysis of 16S rDNA sequences proved more efficient than Biolog for characterising TSA isolates due to limitations of the Microlog database for identifying environmental bacteria. In general, 16S rDNA sequences from TSA isolates showed high similarities to cultured species represented in sequence databases, although TSA-8 showed only 92.5% similarity to the nearest relative, Bacillus insolitus. In general, there was very little overlap between the culturable and uncultured bacterial communities, although two sequences, PIA-2 and TSA-13, showed >99% similarity to soil clones. A cloning step was included prior to sequence analysis of two isolates, TSA-5 and TSA-14, and analysis of several clones confirmed that these cultures comprised at least four and three sequence types, respectively. All isolate clones were most closely related to uncultured bacteria, with clone TSA-5.1 showing 99.8% similarity to a sequence amplified directly from the same soil sample. Interestingly, one clone, TSA-5.4, clustered within a novel group comprising only uncultured sequences. This group, which is associated with the novel, deep-branching Acidobacterium capsulatum lineage, also included clones isolated during direct analysis of the same soil and from a wide range of other sample types studied elsewhere. The study demonstrates the value of fine-scale molecular analysis for identification of laboratory isolates and indicates the culturability of approximately 1% of the total population but under a restricted range of media and cultivation conditions.

  1. Sma3s: a three-step modular annotator for large sequence datasets.

    PubMed

    Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J

    2014-08-01

    Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  2. A Modified LS+AR Model to Improve the Accuracy of the Short-term Polar Motion Prediction

    NASA Astrophysics Data System (ADS)

    Wang, Z. W.; Wang, Q. X.; Ding, Y. Q.; Zhang, J. J.; Liu, S. S.

    2017-03-01

    There are two problems of the LS (Least Squares)+AR (AutoRegressive) model in polar motion forecast: the inner residual value of LS fitting is reasonable, but the residual value of LS extrapolation is poor; and the LS fitting residual sequence is non-linear. It is unsuitable to establish an AR model for the residual sequence to be forecasted, based on the residual sequence before forecast epoch. In this paper, we make solution to those two problems with two steps. First, restrictions are added to the two endpoints of LS fitting data to fix them on the LS fitting curve. Therefore, the fitting values next to the two endpoints are very close to the observation values. Secondly, we select the interpolation residual sequence of an inward LS fitting curve, which has a similar variation trend as the LS extrapolation residual sequence, as the modeling object of AR for the residual forecast. Calculation examples show that this solution can effectively improve the short-term polar motion prediction accuracy by the LS+AR model. In addition, the comparison results of the forecast models of RLS (Robustified Least Squares)+AR, RLS+ARIMA (AutoRegressive Integrated Moving Average), and LS+ANN (Artificial Neural Network) confirm the feasibility and effectiveness of the solution for the polar motion forecast. The results, especially for the polar motion forecast in the 1-10 days, show that the forecast accuracy of the proposed model can reach the world level.

  3. Sphingomonas azotifigens sp. nov., a nitrogen-fixing bacterium isolated from the roots of Oryza sativa.

    PubMed

    Xie, Cheng-Hui; Yokota, Akira

    2006-04-01

    Three yellow-pigmented strains associated with rice plants were characterized by using a polyphasic approach. The nitrogen-fixing abilities of these strains were confirmed by acetylene reduction assay and nifH gene detection. The three strains were found to be very closely related, with 99.9 % 16S rRNA gene sequence similarity and greater than 70 % DNA-DNA hybridization values, suggesting that the three strains represent a single species. 16S rRNA gene sequence analysis indicated that the strains were closely related to Sphingomonas trueperi, with 99.5 % similarity. The chemotaxonomic characteristics (G+C content of the DNA of 68.0 mol%, ubiquinone Q-10 system, 2-OH as the only hydroxy fatty acid and homospermidine as the sole polyamine) were similar to those of members of the genus Sphingomonas. Based on DNA-DNA hybridization values and physiological characteristics, the three novel strains could be differentiated from other recognized species of the genus Sphingomonas. The name Sphingomonas azotifigens sp. nov. is proposed to accommodate these bacterial strains; the type strain is Y39T (=NBRC 15497T = IAM 15283T = CCTCC AB205007T).

  4. GBA manager: an online tool for querying low-complexity regions in proteins.

    PubMed

    Bandyopadhyay, Nirmalya; Kahveci, Tamer

    2010-01-01

    Abstract We developed GBA Manager, an online software that facilitates the Graph-Based Algorithm (GBA) we proposed in our earlier work. GBA identifies the low-complexity regions (LCR) of protein sequences. GBA exploits a similarity matrix, such as BLOSUM62, to compute the complexity of the subsequences of the input protein sequence. It uses a graph-based algorithm to accurately compute the regions that have low complexities. GBA Manager is a user friendly web-service that enables online querying of protein sequences using GBA. In addition to querying capabilities of the existing GBA algorithm, GBA Manager computes the p-values of the LCR identified. The p-value gives an estimate of the possibility that the region appears by chance. GBA Manager presents the output in three different understandable formats. GBA Manager is freely accessible at http://bioinformatics.cise.ufl.edu/GBA/GBA.htm .

  5. Localization properties of transmission lines with generalized Thue-Morse distribution of inductances

    NASA Astrophysics Data System (ADS)

    Lazo, Edmundo; Saavedra, Eduardo; Humire, Fernando; Castro, Cristobal; Cortés-Cortés, Francisco

    2015-09-01

    We study the localization properties of direct transmission lines when we distribute two values of inductances LA and LB according to a generalized Thue-Morse aperiodic sequence generated by the inflation rule: A → ABm-1, B → BAm-1, m ≥ 2 and integer. We regain the usual Thue-Morse sequence for m = 2. We numerically study the changes produced in the localization properties of the I (ω) electric current function with increasing m values. We demonstrate that the m = 2 case does not belong to the family m ≥ 3, because when m changes from m = 2 to m = 3, the number of extended states decreases significantly. However, for m ≫ 3, the localization properties become similar to the m = 2 case. Also, the frequency averaged transmission coefficient shows a strong dependence from the N system size and from the m value which characterize each m-tupling sequence. In addition, for all m value studied, using the scaling behavior of the ξ (ω) normalized participation number, the Rq (ω) Rényi entropies and the μq (ω) moments, we have demonstrated the existence of extended states for certain specific frequencies.

  6. Aestuariicola saemankumensis gen. nov., sp. nov., a member of the family Flavobacteriaceae, isolated from tidal flat sediment.

    PubMed

    Yoon, Jung-Hoon; Kang, So-Jung; Jung, Yong-Taek; Oh, Tae-Kwang

    2008-09-01

    A Gram-negative, non-motile, pleomorphic bacterial strain, designated SMK-142(T), was isolated from a tidal flat of the Yellow Sea, Korea, and was subjected to a polyphasic taxonomic study. Strain SMK-142(T) grew optimally at pH 7.0-8.0, 25 degrees C and in the presence of 2% (w/v) NaCl. Phylogenetic analyses based on 16S rRNA gene sequences showed that strain SMK-142(T) clustered with Lutibacter litoralis with which it exhibited a 16S rRNA gene sequence similarity value of 91.2%. This cluster joined the clade comprising the genera Tenacibaculum and Polaribacter at a high bootstrap resampling value. Strain SMK-142(T) contained MK-6 as the predominant menaquinone and iso-C(15:0), iso-C(15:1) and iso-C(17:0) 3-OH as the major fatty acids. The DNA G+C content was 37.2 mol%. Strain SMK-142(T) was differentiated from three phylogenetically related genera, Lutibacter, Tenacibaculum and Polaribacter, on the basis of low 16S rRNA gene sequence similarity values and differences in fatty acid profiles and in some phenotypic properties. On the basis of phenotypic, chemotaxonomic and phylogenetic data, strain SMK-142(T) represents a novel genus and species for which the name Aestuariicola saemankumensis gen. nov., sp. nov. is proposed (phylum Bacteroidetes, family Flavobacteriaceae). The type strain of the type species, Aestuariicola saemankumensis sp. nov., is SMK-142(T) (=KCTC 22171(T)=CCUG 55329(T)).

  7. Bacillus nealsonii sp. nov., isolated from a spacecraft-assembly facility, whose spores are gamma-radiation resistant

    NASA Technical Reports Server (NTRS)

    Venkateswaran, Kasthuri; Kempf, Michael; Chen, Fei; Satomi, Masataka; Nicholson, Wayne; Kern, Roger

    2003-01-01

    One of the spore-formers isolated from a spacecraft-assembly facility, belonging to the genus Bacillus, is described on the basis of phenotypic characterization, 16S rDNA sequence analysis and DNA-DNA hybridization studies. It is a Gram-positive, facultatively anaerobic, rod-shaped eubacterium that produces endospores. The spores of this novel bacterial species exhibited resistance to UV, gamma-radiation, H2O2 and desiccation. The 18S rDNA sequence analysis revealed a clear affiliation between this strain and members of the low G+C Firmicutes. High 16S rDNA sequence similarity values were found with members of the genus Bacillus and this was supported by fatty acid profiles. The 16S rDNA sequence similarity between strain FO-92T and Bacillus benzoevorans DSM 5391T was very high. However, molecular characterizations employing small-subunit 16S rDNA sequences were at the limits of resolution for the differentiation of species in this genus, but DNA-DNA hybridization data support the proposal of FO-92T as Bacillus nealsonii sp. nov. (type strain is FO-92T =ATCC BAAM-519T =DSM 15077T).

  8. Chryseobacterium chaponense sp. nov., isolated from farmed Atlantic salmon (Salmo salar).

    PubMed

    Kämpfer, Peter; Fallschissel, Kerstin; Avendaño-Herrera, Ruben

    2011-03-01

    Two bacterial strains, designated Sa 1147-06(T) and Sa 1143-06, were isolated from Atlantic salmon (Salmo salar) farmed in Lake Chapo, Chile, and were studied using a polyphasic approach. Both isolates were very similar; cells were rod-shaped, formed yellow-pigmented colonies and were Gram-reaction-negative. Based on 16S rRNA gene sequence analysis, strains Sa 1147-06(T) and Sa 1143-06 shared 100  % sequence similarity and showed 98.9 and 97.5 % sequence similarity to Chryseobacterium jeonii AT1047(T) and Chryseobacterium antarcticum AT1013(T), respectively. Sequence similarities to all other members of the genus Chryseobacterium were below 97.3  %. The major fatty acids of strain Sa 1147-06(T) were iso-C₁₃:₀, iso-C₁₅:₀, anteiso-C₁₅:₀ and iso-C₁₇:₁ω9c, with iso-C₁₅:₀ 3-OH, iso-C₁₆:₀ 3-OH and iso-C₁₇:₀ 3-OH constituting the major hydroxylated fatty acids. DNA-DNA hybridizations with C. jeonii JMSNU 14049(T) and C. antarcticum JMNSU 14040(T) gave relatedness values of 20.7  % (reciprocal 15.1  %) and 15.7 % (reciprocal 25.7  %), respectively. Together, the DNA-DNA hybridization results and differentiating biochemical properties showed that strains Sa 1147-06(T) and Sa 1143-06 represent a novel species, for which the name Chryseobacterium chaponense sp. nov. is proposed. The type strain is Sa 1147-06(T) (=DSM 23145(T) =CCM 7737(T)).

  9. Heuristic reusable dynamic programming: efficient updates of local sequence alignment.

    PubMed

    Hong, Changjin; Tewfik, Ahmed H

    2009-01-01

    Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.

  10. Burkholderia symbiotica sp. nov., isolated from root nodules of Mimosa spp. native to north-east Brazil.

    PubMed

    Sheu, Shih-Yi; Chou, Jui-Hsing; Bontemps, Cyril; Elliott, Geoffrey N; Gross, Eduardo; James, Euan K; Sprent, Janet I; Young, J Peter W; Chen, Wen-Ming

    2012-09-01

    Four strains, designated JPY-345(T), JPY-347, JPY-366 and JPY-581, were isolated from nitrogen-fixing nodules on the roots of two species of Mimosa, Mimosa cordistipula and Mimosa misera, that are native to North East Brazil, and their taxonomic positions were investigated by using a polyphasic approach. All four strains grew at 15-43 °C (optimum 35 °C), at pH 4-7 (optimum pH 5) and with 0-2 % (w/v) NaCl (optimum 0 % NaCl). On the basis of 16S rRNA gene sequence analysis, strain JPY-345(T) showed 97.3 % sequence similarity to the closest related species Burkholderia soli GP25-8(T), 97.3 % sequence similarity to Burkholderia caryophylli ATCC25418(T) and 97.1 % sequence similarity to Burkholderia kururiensis KP23(T). The predominant fatty acids of the strains were C(18 : 1)ω7c (36.1 %), C(16 : 0) (19.8 %) and summed feature 3, comprising C(16 : 1)ω7c and/or C(16 : 1)ω6c (11.5 %). The major isoprenoid quinone was Q-8 and the DNA G+C content of the strains was 64.2-65.7 mol%. The polar lipid profile consisted of a mixture of phosphatidylethanolamine, phosphatidylglycerol, diphosphatidylglycerol and several uncharacterized aminophospholipids and phospholipids. DNA-DNA hybridizations between the novel strain and recognized species of the genus Burkholderia yielded relatedness values of <51.8 %. On the basis of 16S rRNA and recA gene sequence similarities and chemotaxonomic and phenotypic data, the four strains represent a novel species in the genus Burkholderia, for which the name Burkholderia symbiotica sp. nov. is proposed. The type strain is JPY-345(T) (= LMG 26032(T) = BCRC 80258(T) = KCTC 23309(T)).

  11. 'Candidatus Phytoplasma palmicola', associated with a lethal yellowing-type disease of coconut (Cocos nucifera L.) in Mozambique.

    PubMed

    Harrison, Nigel A; Davis, Robert E; Oropeza, Carlos; Helmick, Ericka E; Narváez, María; Eden-Green, Simon; Dollet, Michel; Dickinson, Matthew

    2014-06-01

    In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise similarity values based on alignment of nearly full-length 16S rRNA gene sequences (1530 bp) revealed that the Mozambique coconut phytoplasma (LYDM) shared 100% identity with a comparable sequence derived from a phytoplasma strain (LDN) responsible for Awka wilt disease of coconut in Nigeria, and shared 99.0-99.6% identity with 16S rRNA gene sequences from strains associated with Cape St Paul wilt (CSPW) disease of coconut in Ghana and Côte d'Ivoire. Similarity scores further determined that the 16S rRNA gene of the LYDM phytoplasma shared <97.5% sequence identity with all previously described members of 'Candidatus Phytoplasma'. The presence of unique regions in the 16S rRNA gene sequence distinguished the LYDM phytoplasma from all currently described members of 'Candidatus Phytoplasma', justifying its recognition as the reference strain of a novel taxon, 'Candidatus Phytoplasma palmicola'. Virtual RFLP profiles of the F2n/R2 portion (1251 bp) of the 16S rRNA gene and pattern similarity coefficients delineated coconut LYDM phytoplasma strains from Mozambique as novel members of established group 16SrXXII, subgroup A (16SrXXII-A). Similarity coefficients of 0.97 were obtained for comparisons between subgroup 16SrXXII-A strains and CSPW phytoplasmas from Ghana and Côte d'Ivoire. On this basis, the CSPW phytoplasma strains were designated members of a novel subgroup, 16SrXXII-B.

  12. Protein structural similarity search by Ramachandran codes

    PubMed Central

    Lo, Wei-Cheng; Huang, Po-Jung; Chang, Chih-Hung; Lyu, Ping-Chiang

    2007-01-01

    Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era. PMID:17716377

  13. Molecular phylogeny, population genetics, and evolution of heterocystous cyanobacteria using nifH gene sequences.

    PubMed

    Singh, Prashant; Singh, Satya Shila; Elster, Josef; Mishra, Arun Kumar

    2013-06-01

    In order to assess phylogeny, population genetics, and approximation of future course of cyanobacterial evolution based on nifH gene sequences, 41 heterocystous cyanobacterial strains collected from all over India have been used in the present study. NifH gene sequence analysis data confirm that the heterocystous cyanobacteria are monophyletic while the stigonematales show polyphyletic origin with grave intermixing. Further, analysis of nifH gene sequence data using intricate mathematical extrapolations revealed that the nucleotide diversity and recombination frequency is much greater in Nostocales than the Stigonematales. Similarly, DNA divergence studies showed significant values of divergence with greater gene conversion tracts in the unbranched (Nostocales) than the branched (Stigonematales) strains. Our data strongly support the origin of true branching cyanobacterial strains from the unbranched strains.

  14. High-resolution Fourier transform spectroscopy of the Meinel system of OH

    NASA Technical Reports Server (NTRS)

    Abrams, Mark C.; Davis, Sumner P.; Rao, M. L. P.; Engleman, Rolf, Jr.; Brault, James W.

    1994-01-01

    The infrared spectrum of the hydroxyl radical OH, between 1850 and 9000/cm has been measured with a Fourier transform spectrometer. The source, a hydrogen-ozone diffusion flame, was designed to study the excitation of rotation-vibration levels of the OH Meinel bands under conditions similar to those in the upper atmosphere which produce the nighttime OH airglow emission. Twenty-three bands were observed: nine bands in the Delta upsilon = 1 sequence, nine bands in the Delta upsilon = 2 sequence, and five bands in the Delta upsilon = 3 sequence. A global nonlinear least-squares fit of 1696 lines yielded molecular parameters with a standard deviation of 0.003/cm. Term values are computed, and transition frequencies in the Delta upsilon = 3, 4, 5, 6 sequences in the near-infrared are predicted.

  15. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    PubMed

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https://github.com/wangvsa/CMSA .

  16. Methylobacterium variabile sp. nov., a methylotrophic bacterium isolated from an aquatic environment.

    PubMed

    Gallego, Virginia; García, Maria Teresa; Ventosa, Antonio

    2005-07-01

    Strain GR3(T) was isolated from drinking water during a screening programme to monitor the bacterial population present in the distribution system of Seville (Spain), and it was studied phenotypically, genotypically and phylogenetically. This pink-pigmented bacterium was identified as a Methylobacterium sp. Members of this genus are distributed in a wide variety of natural habitats, including soil, dust, air, freshwater and aquatic sediments. Phylogenetic analysis of the 16S rRNA gene sequence showed that strain GR3(T) was closely related to Methylobacterium aquaticum (97.4% sequence similarity), whereas sequence similarity values with respect to the rest of the species belonging to this genus were lower than 96%. Furthermore, the DNA-DNA hybridization data and its phenotypic characteristics clearly indicate that the isolate represents a novel Methylobacterium species, for which the name Methylobacterium variabile sp. nov. is proposed. GR3(T) (=DSM 16961(T)=CCM 7281(T)=CECT 7045(T)) is the type strain; the DNA G+C content of this strain is 69.2 mol%.

  17. Phylogenetic Analysis of Prevalent Tuberculosis and Non-Tuberculosis Mycobacteria in Isfahan, Iran, Based on a 360 bp Sequence of the rpoB Gene

    PubMed Central

    Nasr Esfahani, Bahram; Moghim, Sharareh; Ghasemian Safaei, Hajieh; Moghoofei, Mohsen; Sedighi, Mansour; Hadifar, Shima

    2016-01-01

    Background Taxonomic and phylogenetic studies of Mycobacterium species have been based around the 16sRNA gene for many years. However, due to the high strain similarity between species in the Mycobacterium genus (94.3% - 100%), defining a valid phylogenetic tree is difficult; consequently, its use in estimating the boundaries between species is limited. The sequence of the rpoB gene makes it an appropriate gene for phylogenetic analysis, especially in bacteria with limited variation. Objectives In the present study, a 360bp sequence of rpoB was used for precise classification of Mycobacterium strains isolated in Isfahan, Iran. Materials and Methods From February to October 2013, 57 clinical and environmental isolates were collected, subcultured, and identified by phenotypic methods. After DNA extraction, a 360bp fragment was PCR-amplified and sequenced. The phylogenetic tree was constructed based on consensus sequence data, using MEGA5 software. Results Slow and fast-growing groups of the Mycobacterium strains were clearly differentiated based on the constructed tree of 56 common Mycobacterium isolates. Each species with a unique title in the tree was identified; in total, 13 nods with a bootstrap value of over 50% were supported. Among the slow-growing group was Mycobacterium kansasii, with M. tuberculosis in a cluster with a bootstrap value of 98% and M. gordonae in another cluster with a bootstrap value of 90%. In the fast-growing group, one cluster with a bootstrap value of 89% was defined, including all fast-growing members present in this study. Conclusions The results suggest that only the application of the rpoB gene sequence is sufficient for taxonomic categorization and definition of a new Mycobacterium species, due to its high resolution power and proper variation in its sequence (85% - 100%); the resulting tree has high validity. PMID:27284397

  18. Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

    PubMed Central

    Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

    2006-01-01

    Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030

  19. Application of representational difference analysis to identify genomic differences between Bradyrhizobium elkanii and B. Japonicum species.

    PubMed

    Soares, René Arderius; Passaglia, Luciane Maria Pereira

    2010-10-01

    Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.

  20. footprintDB: a database of transcription factors with annotated cis elements and binding interfaces.

    PubMed

    Sebastian, Alvaro; Contreras-Moreira, Bruno

    2014-01-15

    Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.

  1. Genetic composition and connectivity of the Antillean manatee (Trichechus manatus manatus) in Panama

    USGS Publications Warehouse

    Díaz-Ferguson, Edgardo; Hunter, Margaret; Guzmán, Héctor M.

    2017-01-01

    Genetic diversity and haplotype composition of the West Indian manatee (Trichechus manatus) population from the San San Pond Sak wetland in Bocas del Toro, Panama was studied using a segment of mitochondrial DNA (D’loop). No genetic information has been published to date for Panamanian populations. Due to the secretive behavior and small population size of the species in the area, DNA extraction was conducted from opportunistically collected fecal (N=20), carcass tissue (N=4) and bone (N=4) samples. However, after DNA processing only 10 samples provided good quality DNA for sequencing (3 fecal, 4 tissue and 3 bone samples). We found three haplotypes in total; two of these haplotypes are reported for the first time, J02 (N=3) and J03 (N=4), and one J01 was previously published (N=3). Genetic diversity showed similar values to previous studies conducted in other Caribbean regions with moderate values of nucleotide diversity (π= 0.00152) and haplotipic diversity (Hd= 0.57). Connectivity assessment was based on sequence similarity, genetic distance and genetic differentiation between San San population and other manatee populations previously studied. The J01 haplotype found in the Panamanian population is shared with populations in the Caribbean mainland and the Gulf of Mexico showing a reduced differentiation corroborated with Fst value between HSSPS and this region of 0.0094. In contrast, comparisons between our sequences and populations in the Eastern Caribbean (South American populations) and North Western Caribbean showed fewer similarities (Fst =0.049 and 0.058, respectively). These results corroborate previous phylogeographic patterns already established for manatee populations and situate Panamanian populations into the Belize and Mexico cluster. In addition, these findings will be a baseline for future studies and comparisons with manatees in other areas of Panama and Central America. These results should be considered to inform management decisions regarding conservation of genetic diversity, future controlled introductions, connectivity and effective population size of the West Indian manatee along the Central American corridor.

  2. Mass Dependent Fractionation of Hg Isotopes in Source Rocks, Mineral Deposits and Spring Waters of the California Coast Ranges, USA

    NASA Astrophysics Data System (ADS)

    Smith, C. N.; Kesler, S. E.; Blum, J. D.; Rytuba, J. J.

    2007-12-01

    We present here the first study of the isotopic composition of Hg in rocks, ore deposits, and active hydrothermal systems from the California Coast Ranges, one of Earth's largest Hg-depositing systems. The Franciscan Complex and Great Valley Sequence, which form the bedrock in the California Coast Ranges, are intruded and overlain by Tertiary volcanic rocks including the Clear Lake Volcanic Sequence. These rocks contain two types of Hg deposits, hot-spring deposits that form at shallow depths (<300 m) and silica-carbonate deposits that extend to greater depths (200 to 1000 m), as well as active springs and geothermal systems that release Hg to the present surface. The Franciscan Complex and Great Valley Sequence contain clastic sedimentary rocks with higher concentrations of Hg than volcanic rocks of the Clear Lake Volcanic Field. Mean Hg isotope compositions for all three rock units are similar, although the range of values in Franciscan Complex rocks is greater than in either Great Valley or Clear Lake rocks. Hot spring and silica-carbonate Hg deposits have similar average isotopic compositions that are indistinguishable from averages for the three rock units, although δ202Hg values for the Hg deposits have a greater variance than the country rocks. Precipitates from dilute spring and saline thermal waters in the area have similarly large variance and a mean δ202Hg value that is significantly lower than the ore deposits and rocks. These observations indicate there is little or no isotopic fractionation during release of Hg from its source rocks into hydrothermal solutions. Isotopic fractionation does appear to take place during transport and concentration of Hg in deposits, especially in their uppermost parts. Boiling of hydrothermal fluids is likely the most important process causing of the observed Hg isotope fractionation. This should result in the release of Hg with low δ202Hg values into the atmosphere from the top of these hydrothermal systems and a consequent enrichment in heavy Hg isotopes in the upper crust through time.

  3. Analysis of 10,000 ESTs from lymphocytes of the cynomolgus monkey to improve our understanding of its immune system

    PubMed Central

    Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning

    2006-01-01

    Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371

  4. De novo assembly and characterization of the garlic (Allium sativum) bud transcriptome by Illumina sequencing.

    PubMed

    Sun, Xiudong; Zhou, Shumei; Meng, Fanlu; Liu, Shiqi

    2012-10-01

    Garlic is widely used as a spice throughout the world for the culinary value of its flavor and aroma, which are created by the chemical transformation of a series of organic sulfur compounds. To analyze the transcriptome of Allium sativum and discover the genes involved in sulfur metabolism, cDNAs derived from the total RNA of Allium sativum buds were analyzed by Illumina sequencing. Approximately 26.67 million 90 bp paired-end clean reads were achieved in two libraries. A total of 127,933 unigenes were generated by de novo assembly and were compared with the sequences in public databases. Of these, 45,286 unigenes had significant hits to the sequences in the Nr database, 29,514 showed significant similarity to known proteins in the Swiss-Prot database and, 20,706 and 21,952 unigenes had significant similarity to existing sequences in the KEGG and COG databases, respectively. Moreover, genes involved in organic sulfur biosynthesis were identified. These unigenes data will provide the foundation for research on gene expression, genomics and functional genomics in Allium sativum. Key message The obtained unigenes will provide the foundation for research on functional genomics in Allium sativum and its closely related species, and fill the gap of the existing plant EST database.

  5. Azospirillum zeae sp. nov., a diazotrophic bacterium isolated from rhizosphere soil of Zea mays.

    PubMed

    Mehnaz, Samina; Weselowski, Brian; Lazarovits, George

    2007-12-01

    Two free-living nitrogen-fixing bacterial strains, N6 and N7(T), were isolated from corn rhizosphere. A polyphasic taxonomic approach, including morphological characterization, Biolog analysis, DNA-DNA hybridization, and 16S rRNA, cpn60 and nifH gene sequence analysis, was taken to analyse the two strains. 16S rRNA gene sequence analysis indicated that strains N6 and N7(T) both belonged to the genus Azospirillum and were closely related to Azospirillum oryzae (98.7 and 98.8 % similarity, respectively) and Azospirillum lipoferum (97.5 and 97.6 % similarity, respectively). DNA-DNA hybridization of strains N6 and N7(T) showed reassociation values of 48 and 37 %, respectively, with A. oryzae and 43 % with A. lipoferum. Sequences of the nifH and cpn60 genes of both strains showed 99 and approximately 95 % similarity, respectively, with those of A. oryzae. Chemotaxonomic characteristics (Q-10 as quinone system, 18 : 1omega7c as major fatty acid) and G+C content of the DNA (67.6 mol%) were also similar to those of members of the genus Azospirillum. Gene sequences and Biolog and fatty acid analysis showed that strains N6 and N7(T) differed from the closely related species A. lipoferum and A. oryzae. On the basis of these results, it is proposed that these nitrogen-fixing strains represent a novel species. The name Azospirillum zeae sp. nov. is suggested, with N7(T) (=NCCB 100147(T)=LMG 23989(T)) as the type strain.

  6. Substrate-Driven Mapping of the Degradome by Comparison of Sequence Logos

    PubMed Central

    Fuchs, Julian E.; von Grafenstein, Susanne; Huber, Roland G.; Kramer, Christian; Liedl, Klaus R.

    2013-01-01

    Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available. PMID:24244149

  7. Streptococcus caprae sp. nov., isolated from Iberian ibex (Capra pyrenaica hispanica).

    PubMed

    Vela, A I; Mentaberre, G; Lavín, S; Domínguez, L; Fernández-Garayzábal, J F

    2016-01-01

    Biochemical and molecular genetic studies were performed on a novel Gram-stain-positive, catalase-negative, coccus-shaped organism isolated from tonsil samples of two Iberian ibexes. The micro-organism was identified as a streptococcal species based on its cellular, morphological and biochemical characteristics. 16S rRNA gene sequence comparison studies confirmed its identification as a member of the genus Streptococcus, but the organism did not correspond to any species of this genus. The nearest phylogenetic relative of the unknown coccus from ibex was Streptococcus porci 2923-03T (96.6 % 16S rRNA gene sequence similarity). Analysis based on rpoB and sodA gene sequences revealed sequence similarity values lower than 86.0 and 83.8 %, respectively, from the type strains of recognized Streptococcus species. The novel bacterial isolate was distinguished from Streptococcus porci and other Streptococcus species using biochemical tests. Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be classified as representing a novel species of the genus Streptococcus, for which the name Streptococcus caprae sp. nov. is proposed. The type strain is DICM07-02790-1CT ( = CECT 8872T = CCUG 67170T).

  8. Bioremediation potential of a highly mercury resistant bacterial strain Sphingobium SA2 isolated from contaminated soil.

    PubMed

    Mahbub, Khandaker Rayhan; Krishnan, Kannan; Megharaj, Mallavarapu; Naidu, Ravi

    2016-02-01

    A mercury resistant bacterial strain, SA2, was isolated from soil contaminated with mercury. The 16S rRNA gene sequence of this isolate showed 99% sequence similarity to the genera Sphingobium and Sphingomonas of α-proteobacteria group. However, the isolate formed a distinct phyletic line with the genus Sphingobium suggesting the strain belongs to Sphingobium sp. Toxicity studies indicated resistance to high levels of mercury with estimated EC50 values 4.5 mg L(-1) and 44.15 mg L(-1) and MIC values 5.1 mg L(-1) and 48.48 mg L(-1) in minimal and rich media, respectively. The strain SA2 was able to volatilize mercury by producing mercuric reductase enzyme which makes it potential candidate for remediating mercury. ICP-QQQ-MS analysis of Hg supplemented culture solutions confirmed that almost 79% mercury in the culture suspension was volatilized in 6 h. A very small amount of mercury was observed to accumulate in cell pellets which was also evident according to ESEM-EDX analysis. The mercuric reductase gene merA was amplified and sequenced. The deduced amino acid sequence demonstrated sequence homology with α-proteobacteria and Ascomycota group. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. The novel primers for mammal species identification-based mitochondrial cytochrome b sequence: implication for reserved wild animals in Thailand and endangered mammal species in Southeast Asia.

    PubMed

    Muangkram, Yuttamol; Wajjwalku, Worawidh; Amano, Akira; Sukmak, Manakorn

    2018-01-01

    We presented the powerful techniques for species identification using the short amplicon of mitochondrial cytochrome b gene sequence. Two faecal samples and one single hair sample of the Asian tapir were tested using the new cytochrome b primers. The results showed a high sequence similarity with the mainland Asian tapir group. The comparative sequence analysis of the reserved wild mammals in Thailand and the other endangered mammal species from Southeast Asia comprehensibly verified the potential of our novel primers. The forward and reverse primers were 94.2 and 93.2%, respectively, by the average value of the sequence identity among 77 species sequences, and the overall mean distance was 35.9%. This development technique could provide rapid, simple, and reliable tools for species confirmation. Especially, it could recognize the problematic biological specimens contained less DNA material from illegal products and assist with wildlife crime investigation of threatened species and related forensic casework.

  10. DUK - A Fast and Efficient Kmer Based Sequence Matching Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Mingkun; Copeland, Alex; Han, James

    2011-03-21

    A new tool, DUK, is developed to perform matching task. Matching is to find whether a query sequence partially or totally matches given reference sequences or not. Matching is similar to alignment. Indeed many traditional analysis tasks like contaminant removal use alignment tools. But for matching, there is no need to know which bases of a query sequence matches which position of a reference sequence, it only need know whether there exists a match or not. This subtle difference can make matching task much faster than alignment. DUK is accurate, versatile, fast, and has efficient memory usage. It uses Kmermore » hashing method to index reference sequences and Poisson model to calculate p-value. DUK is carefully implemented in C++ in object oriented design. The resulted classes can also be used to develop other tools quickly. DUK have been widely used in JGI for a wide range of applications such as contaminant removal, organelle genome separation, and assembly refinement. Many real applications and simulated dataset demonstrate its power.« less

  11. Bacillus Strains Most Closely Related to Bacillus nealsonii Are Not Effectively Circumscribed within the Taxonomic Species Definition

    PubMed Central

    Peak, K. Kealy; Duncan, Kathleen E.; Luna, Vicki A.; King, Debra S.; McCarthy, Peter J.; Cannons, Andrew C.

    2011-01-01

    Bacillus strains with >99.7% 16S rRNA gene sequence similarity were characterized with DNA:DNA hybridization, cellular fatty acid (CFA) analysis, and testing of 100 phenotypic traits. When paired with the most closely related type strain, percent DNA:DNA similarities (% S) for six Bacillus strains were all far below the recommended 70% threshold value for species circumscription with Bacillus nealsonii. An apparent genomic group of four Bacillus strain pairings with 94%–70% S was contradicted by the failure of the strains to cluster in CFA- and phenotype-based dendrograms as well as by their differentiation with 9–13 species level discriminators such as nitrate reduction, temperature range, and acid production from carbohydrates. The novel Bacillus strains were monophyletic and very closely related based on 16S rRNA gene sequence. Coherent genomic groups were not however supported by similarly organized phenotypic clusters. Therefore, the strains were not effectively circumscribed within the taxonomic species definition. PMID:22046187

  12. A self-similar hierarchy of the Korean stock market

    NASA Astrophysics Data System (ADS)

    Lim, Gyuchang; Min, Seungsik; Yoo, Kun-Woo

    2013-01-01

    A scaling analysis is performed on market values of stocks listed on Korean stock exchanges such as the KOSPI and the KOSDAQ. Different from previous studies on price fluctuations, market capitalizations are dealt with in this work. First, we show that the sum of the two stock exchanges shows a clear rank-size distribution, i.e., the Zipf's law, just as each separate one does. Second, by abstracting Zipf's law as a γ-sequence, we define a self-similar hierarchy consisting of many levels, with the numbers of firms at each level forming a geometric sequence. We also use two exponential functions to describe the hierarchy and derive a scaling law from them. Lastly, we propose a self-similar hierarchical process and perform an empirical analysis on our data set. Based on our findings, we argue that all money invested in the stock market is distributed in a hierarchical way and that a slight difference exists between the two exchanges.

  13. Similarity of Symbol Frequency Distributions with Heavy Tails

    NASA Astrophysics Data System (ADS)

    Gerlach, Martin; Font-Clos, Francesc; Altmann, Eduardo G.

    2016-04-01

    Quantifying the similarity between symbolic sequences is a traditional problem in information theory which requires comparing the frequencies of symbols in different sequences. In numerous modern applications, ranging from DNA over music to texts, the distribution of symbol frequencies is characterized by heavy-tailed distributions (e.g., Zipf's law). The large number of low-frequency symbols in these distributions poses major difficulties to the estimation of the similarity between sequences; e.g., they hinder an accurate finite-size estimation of entropies. Here, we show analytically how the systematic (bias) and statistical (fluctuations) errors in these estimations depend on the sample size N and on the exponent γ of the heavy-tailed distribution. Our results are valid for the Shannon entropy (α =1 ), its corresponding similarity measures (e.g., the Jensen-Shanon divergence), and also for measures based on the generalized entropy of order α . For small α 's, including α =1 , the errors decay slower than the 1 /N decay observed in short-tailed distributions. For α larger than a critical value α*=1 +1 /γ ≤2 , the 1 /N decay is recovered. We show the practical significance of our results by quantifying the evolution of the English language over the last two centuries using a complete α spectrum of measures. We find that frequent words change more slowly than less frequent words and that α =2 provides the most robust measure to quantify language change.

  14. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

    PubMed

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-05-01

    Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  15. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  16. Oscillating and pulsed gradient diffusion magnetic resonance microscopy over an extended b-value range: implications for the characterization of tissue microstructure.

    PubMed

    Portnoy, S; Flint, J J; Blackband, S J; Stanisz, G J

    2013-04-01

    Oscillating gradient spin-echo (OGSE) pulse sequences have been proposed for acquiring diffusion data with very short diffusion times, which probe tissue structure at the subcellular scale. OGSE sequences are an alternative to pulsed gradient spin echo measurements, which typically probe longer diffusion times due to gradient limitations. In this investigation, a high-strength (6600 G/cm) gradient designed for small-sample microscopy was used to acquire OGSE and pulsed gradient spin echo data in a rat hippocampal specimen at microscopic resolution. Measurements covered a broad range of diffusion times (TDeff = 1.2-15.0 ms), frequencies (ω = 67-1000 Hz), and b-values (b = 0-3.2 ms/μm2). Variations in apparent diffusion coefficient with frequency and diffusion time provided microstructural information at a scale much smaller than the imaging resolution. For a more direct comparison of the techniques, OGSE and pulsed gradient spin echo data were acquired with similar effective diffusion times. Measurements with similar TDeff were consistent at low b-value (b < 1 ms/μm(2) ), but diverged at higher b-values. Experimental observations suggest that the effective diffusion time can be helpful in the interpretation of low b-value OGSE data. However, caution is required at higher b, where enhanced sensitivity to restriction and exchange render the effective diffusion time an unsuitable representation. Oscillating and pulsed gradient diffusion techniques offer unique, complementary information. In combination, the two methods provide a powerful tool for characterizing complex diffusion within biological tissues. Copyright © 2012 Wiley Periodicals, Inc.

  17. Comparative genomic analysis of Mycobacterium tuberculosis clinical isolates.

    PubMed

    Liu, Fei; Hu, Yongfei; Wang, Qi; Li, Hong Min; Gao, George F; Liu, Cui Hua; Zhu, Baoli

    2014-06-13

    Due to excessive antibiotic use, drug-resistant Mycobacterium tuberculosis has become a serious public health threat and a major obstacle to disease control in many countries. To better understand the evolution of drug-resistant M. tuberculosis strains, we performed whole genome sequencing for 7 M. tuberculosis clinical isolates with different antibiotic resistance profiles and conducted comparative genomic analysis of gene variations among them. We observed that all 7 M. tuberculosis clinical isolates with different levels of drug resistance harbored similar numbers of SNPs, ranging from 1409-1464. The numbers of insertion/deletions (Indels) identified in the 7 isolates were also similar, ranging from 56 to 101. A total of 39 types of mutations were identified in drug resistance-associated loci, including 14 previously reported ones and 25 newly identified ones. Sixteen of the identified large Indels spanned PE-PPE-PGRS genes, which represents a major source of antigenic variability. Aside from SNPs and Indels, a CRISPR locus with varied spacers was observed in all 7 clinical isolates, suggesting that they might play an important role in plasticity of the M. tuberculosis genome. The nucleotide diversity (Л value) and selection intensity (dN/dS value) of the whole genome sequences of the 7 isolates were similar. The dN/dS values were less than 1 for all 7 isolates (range from 0.608885 to 0.637365), supporting the notion that M. tuberculosis genomes undergo purifying selection. The Л values and dN/dS values were comparable between drug-susceptible and drug-resistant strains. In this study, we show that clinical M. tuberculosis isolates exhibit distinct variations in terms of the distribution of SNP, Indels, CRISPR-cas locus, as well as the nucleotide diversity and selection intensity, but there are no generalizable differences between drug-susceptible and drug-resistant isolates on the genomic scale. Our study provides evidence strengthening the notion that the evolution of drug resistance among clinical M. tuberculosis isolates is clearly a complex and diversified process.

  18. Correction of the lack of commutability between plasmid DNA and genomic DNA for quantification of genetically modified organisms using pBSTopas as a model.

    PubMed

    Zhang, Li; Wu, Yuhua; Wu, Gang; Cao, Yinglong; Lu, Changming

    2014-10-01

    Plasmid calibrators are increasingly applied for polymerase chain reaction (PCR) analysis of genetically modified organisms (GMOs). To evaluate the commutability between plasmid DNA (pDNA) and genomic DNA (gDNA) as calibrators, a plasmid molecule, pBSTopas, was constructed, harboring a Topas 19/2 event-specific sequence and a partial sequence of the rapeseed reference gene CruA. Assays of the pDNA showed similar limits of detection (five copies for Topas 19/2 and CruA) and quantification (40 copies for Topas 19/2 and 20 for CruA) as those for the gDNA. Comparisons of plasmid and genomic standard curves indicated that the slopes, intercepts, and PCR efficiency for pBSTopas were significantly different from CRM Topas 19/2 gDNA for quantitative analysis of GMOs. Three correction methods were used to calibrate the quantitative analysis of control samples using pDNA as calibrators: model a, or coefficient value a (Cva); model b, or coefficient value b (Cvb); and the novel model c or coefficient formula (Cf). Cva and Cvb gave similar estimated values for the control samples, and the quantitative bias of the low concentration sample exceeded the acceptable range within ±25% in two of the four repeats. Using Cfs to normalize the Ct values of test samples, the estimated values were very close to the reference values (bias -13.27 to 13.05%). In the validation of control samples, model c was more appropriate than Cva or Cvb. The application of Cf allowed pBSTopas to substitute for Topas 19/2 gDNA as a calibrator to accurately quantify the GMO.

  19. The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

    PubMed

    Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

    2013-01-01

    Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.

  20. Fungi as Endophytes in Artemisia thuscula: Juxtaposed Elements of Diversity and Phylogeny.

    PubMed

    Cosoveanu, Andreea; Rodriguez Sabina, Samuel; Cabrera, Raimundo

    2018-01-27

    Artemisia is a plant genus highly studied for its medicinal applications. The studies on the associated fungal endophytes are scarce. Ten plants specimens of Artemisia thuscula from Tenerife and La Palma were sampled to isolate the endophytic fungi. Identification of the endophytic fungi was based on morphology, Internal Transcribed Spacer (ITS) and Large Subunit (LSU) regions sequencing and indicates 37 fungal species affiliated to 25 fungal genera. Colonization rate varied among plants (CR = 25% to 92.11%). The most dominant colonizers found were Alternaria alternata (CF = 18.71%), Neofusicoccum sp. (CF = 8.39%) and Preussia sp. (CF = 3.23). Tendency for host specificity of most endophytic fungal species was observed. Sorensen-Dice index revealed that of 45 cases in the matrix, 27 of them were of zero similarity. Further, only one case was found to have 57% similarity (TF2 and TF7) and one case with 50% similarity (TF1 and TF4). The rest of the cases had values ranging between 11% and 40% similarity. Diversity indices like Brillouin, Margalef species richness, Simpson index of diversity and Fisher's alpha, revealed plants from La Palma with higher values than plants from Tenerife. Three nutrient media (i.e., potato dextrose agar-PDA, lignocellulose agar-LCA, and tomato juice agar-V8) were used in a case study and revealed no differences in terms of colonization rate when data was averaged. Colonization frequency showed several species with preference for nutrient medium (63% of the species were isolated from only one nutrient medium). For the phylogenetic reconstruction using the Bayesian method, 54 endophytic fungal ITS sequences and associated GenBank sequences were analyzed. Ten orders (Diaporthales, Dothideales, Botryosphaeriales, Hypocreales, Trichosphaeriales, Amphisphaeriales, Xylariales, Capnodiales, Pleosporales and Eurotiales) were recognized. Several arrangements of genera draw the attention, like Aureobasidium (Dothideales) and Aplosporella (Botryosphaeriales) which are clustered with a recent ancestor (BS = 0.97).

  1. Molecular cloning and characterization of an alpha-amylase from Pichia burtonii 15-1.

    PubMed

    Kato, Saemi; Shimizu-Ibuka, Akiko; Mura, Kiyoshi; Takeuchi, Akiko; Tokue, Chiyoko; Arai, Soichi

    2007-12-01

    An alpha-amylase secreted by Pichia burtonii 15-1 isolated from a traditional starter murcha of Nepal, named Pichia burtonii alpha-amylase (PBA), was studied. The gene was cloned and its nucleotide sequence was determined. PBA was deduced to consist of 494 amino acid residues. It shared certain degrees of amino acid sequence identity with other homologous proteins: 60% with Schwanniomyces occidentalis alpha-amylase, 58% with Saccharomycopsis sp. alpha-amylase, and 47% with Taka-amylase A from Aspergillus oryzae. A three-dimensional structural model of PBA generated using the known three-dimensional structure of Taka-amylase A as a template suggested high structural similarity between them. Kinetic analysis revealed that the K(m) values of PBA were lower than those of Taka-amylase A for the oligosaccharides. Although the k(cat) values of PBA were lower than those of Taka-amylase A for the oligosaccharide substrates, the k(cat)/K(m) values of PBA were higher.

  2. QSRA: a quality-value guided de novo short read assembler.

    PubMed

    Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C

    2009-02-24

    New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.

  3. Prediction of Ras-effector interactions using position energy matrices.

    PubMed

    Kiel, Christina; Serrano, Luis

    2007-09-01

    One of the more challenging problems in biology is to determine the cellular protein interaction network. Progress has been made to predict protein-protein interactions based on structural information, assuming that structural similar proteins interact in a similar way. In a previous publication, we have determined a genome-wide Ras-effector interaction network based on homology models, with a high accuracy of predicting binding and non-binding domains. However, for a prediction on a genome-wide scale, homology modelling is a time-consuming process. Therefore, we here successfully developed a faster method using position energy matrices, where based on different Ras-effector X-ray template structures, all amino acids in the effector binding domain are sequentially mutated to all other amino acid residues and the effect on binding energy is calculated. Those pre-calculated matrices can then be used to score for binding any Ras or effector sequences. Based on position energy matrices, the sequences of putative Ras-binding domains can be scanned quickly to calculate an energy sum value. By calibrating energy sum values using quantitative experimental binding data, thresholds can be defined and thus non-binding domains can be excluded quickly. Sequences which have energy sum values above this threshold are considered to be potential binding domains, and could be further analysed using homology modelling. This prediction method could be applied to other protein families sharing conserved interaction types, in order to determine in a fast way large scale cellular protein interaction networks. Thus, it could have an important impact on future in silico structural genomics approaches, in particular with regard to increasing structural proteomics efforts, aiming to determine all possible domain folds and interaction types. All matrices are deposited in the ADAN database (http://adan-embl.ibmc.umh.es/). Supplementary data are available at Bioinformatics online.

  4. A generalized global alignment algorithm.

    PubMed

    Huang, Xiaoqiu; Chao, Kun-Mao

    2003-01-22

    Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.

  5. Pollutants degradation performance and microbial community structure of aerobic granular sludge systems using inoculums adapted at mild and low temperature.

    PubMed

    Muñoz-Palazon, Barbara; Pesciaroli, Chiara; Rodriguez-Sanchez, Alejandro; Gonzalez-Lopez, Jesús; Gonzalez-Martinez, Alejandro

    2018-08-01

    Three aerobic granular sequencing batch reactors were inoculated using different inocula from Finland, Spain and a mix of both in order to investigate the effect over the degradation performance and the microbial community structure. The Finnish inoculum achieved a faster granulation and a higher depollution performance within the first two month of operation. However, after 90 days of operation, similar physico-chemical values were observed. On the other hand, the Real-time PCR showed that Archaea diminished from inoculum to granular biomass, while Bacteria and Fungi numbers remained stable. All granular biomass massive parallel sequencing studies were similar regardless of the inocula from which they formed, as confirmed by singular value decomposition principal coordinates analysis, expected effect size of OTUs, and β-diversity analyses. Thermoproteaceae, Meganema and a Trischosporonaceae members were the dominant phylotypes for the three domains studied. The analysis of oligotype distribution demonstrated that a fungal oligotype was ubiquitous. The dominant OTUs of Bacteria were correlated with bioreactors performance. The results obtained determined that the microbial community structure of aerobic granular sludge was similar regardless of their inocula, showing that the granulation of biomass is related to several phylotypes. This will be of future importance for the implementation of aerobic granular sludge to full-scale systems. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Genetic diversity and antigenicity variation of Babesia bovis merozoite surface antigen-1 (MSA-1) in Thailand.

    PubMed

    Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Takemae, Hitoshi; Simking, Pacharathon; Jittapalapong, Sathaporn; Igarashi, Ikuo; Yokoyama, Naoaki

    2016-07-01

    Babesia bovis, an intraerythrocytic protozoan parasite, causes severe clinical disease in cattle worldwide. The genetic diversity of parasite antigens often results in different immune profiles in infected animals, hindering efforts to develop immune control methodologies against the B. bovis infection. In this study, we analyzed the genetic diversity of the merozoite surface antigen-1 (msa-1) gene using 162 B. bovis-positive blood DNA samples sourced from cattle populations reared in different geographical regions of Thailand. The identity scores shared among 93 msa-1 gene sequences isolated by PCR amplification were 43.5-100%, and the similarity values among the translated amino acid sequences were 42.8-100%. Of 23 total clades detected in our phylogenetic analysis, Thai msa-1 gene sequences occurred in 18 clades; seven among them were composed of sequences exclusively from Thailand. To investigate differential antigenicity of isolated MSA-1 proteins, we expressed and purified eight recombinant MSA-1 (rMSA-1) proteins, including an rMSA-1 from B. bovis Texas (T2Bo) strain and seven rMSA-1 proteins based on the Thai msa-1 sequences. When these antigens were analyzed in a western blot assay, anti-T2Bo cattle serum strongly reacted with the rMSA-1 from T2Bo, as well as with three other rMSA-1 proteins that shared 54.9-68.4% sequence similarity with T2Bo MSA-1. In contrast, no or weak reactivity was observed for the remaining rMSA-1 proteins, which shared low sequence similarity (35.0-39.7%) with T2Bo MSA-1. While demonstrating the high genetic diversity of the B. bovis msa-1 gene in Thailand, the present findings suggest that the genetic diversity results in antigenicity variations among the MSA-1 antigens of B. bovis in Thailand. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Aligner optimization increases accuracy and decreases compute times in multi-species sequence data.

    PubMed

    Robinson, Kelly M; Hawkins, Aziah S; Santana-Cruz, Ivette; Adkins, Ricky S; Shetty, Amol C; Nagaraj, Sushma; Sadzewicz, Lisa; Tallon, Luke J; Rasko, David A; Fraser, Claire M; Mahurkar, Anup; Silva, Joana C; Dunning Hotopp, Julie C

    2017-09-01

    As sequencing technologies have evolved, the tools to analyze these sequences have made similar advances. However, for multi-species samples, we observed important and adverse differences in alignment specificity and computation time for bwa- mem (Burrows-Wheeler aligner-maximum exact matches) relative to bwa-aln. Therefore, we sought to optimize bwa-mem for alignment of data from multi-species samples in order to reduce alignment time and increase the specificity of alignments. In the multi-species cases examined, there was one majority member (i.e. Plasmodium falciparum or Brugia malayi ) and one minority member (i.e. human or the Wolbachia endosymbiont w Bm) of the sequence data. Increasing bwa-mem seed length from the default value reduced the number of read pairs from the majority sequence member that incorrectly aligned to the reference genome of the minority sequence member. Combining both source genomes into a single reference genome increased the specificity of mapping, while also reducing the central processing unit (CPU) time. In Plasmodium , at a seed length of 18 nt, 24.1 % of reads mapped to the human genome using 1.7±0.1 CPU hours, while 83.6 % of reads mapped to the Plasmodium genome using 0.2±0.0 CPU hours (total: 107.7 % reads mapping; in 1.9±0.1 CPU hours). In contrast, 97.1 % of the reads mapped to a combined Plasmodium- human reference in only 0.7±0.0 CPU hours. Overall, the results suggest that combining all references into a single reference database and using a 23 nt seed length reduces the computational time, while maximizing specificity. Similar results were found for simulated sequence reads from a mock metagenomic data set. We found similar improvements to computation time in a publicly available human-only data set.

  8. Antagonistic Pleiotropy and Fitness Trade-Offs Reveal Specialist and Generalist Traits in Strains of Canine Distemper Virus

    PubMed Central

    Nikolin, Veljko M.; Osterrieder, Klaus; von Messling, Veronika; Hofer, Heribert; Anderson, Danielle; Dubovi, Edward; Brunner, Edgar; East, Marion L.

    2012-01-01

    Theoretically, homogeneous environments favor the evolution of specialists whereas heterogeneous environments favor generalists. Canine distemper is a multi-host carnivore disease caused by canine distemper virus (CDV). The described cell receptor of CDV is SLAM (CD150). Attachment of CDV hemagglutinin protein (CDV-H) to this receptor facilitates fusion and virus entry in cooperation with the fusion protein (CDV-F). We investigated whether CDV strains co-evolved in the large, homogeneous domestic dog population exhibited specialist traits, and strains adapted to the heterogeneous environment of smaller populations of different carnivores exhibited generalist traits. Comparison of amino acid sequences of the SLAM binding region revealed higher similarity between sequences from Canidae species than to sequences from other carnivore families. Using an in vitro assay, we quantified syncytia formation mediated by CDV-H proteins from dog and non-dog CDV strains in cells expressing dog, lion or cat SLAM. CDV-H proteins from dog strains produced significantly higher values with cells expressing dog SLAM than with cells expressing lion or cat SLAM. CDV-H proteins from strains of non-dog species produced similar values in all three cell types, but lower values in cells expressing dog SLAM than the values obtained for CDV-H proteins from dog strains. By experimentally changing one amino acid (Y549H) in the CDV-H protein of one dog strain we decreased expression of specialist traits and increased expression of generalist traits, thereby confirming its functional importance. A virus titer assay demonstrated that dog strains produced higher titers in cells expressing dog SLAM than cells expressing SLAM of non-dog hosts, which suggested possible fitness benefits of specialization post-cell entry. We provide in vitro evidence for the expression of specialist and generalist traits by CDV strains, and fitness trade-offs across carnivore host environments caused by antagonistic pleiotropy. These findings extend knowledge on CDV molecular epidemiology of particular relevance to wild carnivores. PMID:23239996

  9. Sequence-similar, structure-dissimilar protein pairs in the PDB.

    PubMed

    Kosloff, Mickey; Kolodny, Rachel

    2008-05-01

    It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).

  10. ‘Candidatus Phytoplasma palmicola’, a novel taxon associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique

    USDA-ARS?s Scientific Manuscript database

    In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise sequence similarity values based on alignment of near full-length 16SrRNA genes (1530 bp) reve...

  11. Nocardioides albertanoniae sp. nov., isolated from Roman catacombs.

    PubMed

    Alias-Villegas, Cynthia; Jurado, Valme; Laiz, Leonila; Miller, Ana Z; Saiz-Jimenez, Cesareo

    2013-04-01

    A Gram-reaction-positive, aerobic, non-spore-forming, rod- or coccoid-shaped, strain, CD40127(T), was isolated from a green biofilm covering the wall of the Domitilla Catacombs in Rome, Italy. Phylogenetic analysis based on 16S rRNA gene sequences revealed that strain CD40127(T) belongs to the genus Nocardioides, closely related to Nocardioides luteus DSM 43366(T) and Nocardioides albus DSM 43109(T) with 98.86 % and 98.01 % similarity values, respectively. Strain CD40127(T) exhibited 16S rRNA gene sequence similarity values below 96.29 % with the rest of the species of the genus Nocardioides. The G+C content of the genomic DNA was 69.7 mol%. The predominant fatty acid was iso-C16 : 0 and the major menaquinone was MK-8(H4) in accordance with the phenotypes of other species of the genus Nocardioides. A polyphasic approach using physiological tests, fatty acid profiles, DNA base ratios and DNA-DNA hybridization showed that isolate CD40127(T) represents a novel species within the genus Nocardioides, for which the name Nocardioides albertanoniae is proposed. The type strain is CD40127(T) ( = DSM 25218(T) = CECT 8014(T)).

  12. Aeribacillus composti sp. nov., a thermophilic bacillus isolated from olive mill pomace compost.

    PubMed

    Finore, Ilaria; Gioiello, Alessia; Leone, Luigi; Orlando, Pierangelo; Romano, Ida; Nicolaus, Barbara; Poli, Annarita

    2017-11-01

    A Gram-stain-positive, aerobic, endospore-forming, thermophilic bacterium, strain N.8 T , was isolated from the curing step of an olive mill pomace compost sample, collected at the Composting Experimental Centre (CESCO, Salerno, Italy). Strain N.8 T , based on 16S rRNA gene sequence similarities, was most closely related to Aeribacillus pallidus strain H12 T (=DSM 3670 T ) (99.8 % similarity value) with a 25 % DNA-DNA relatedness value. Cells were rod-shaped, non-motile and grew optimally at 60 °C and pH 9.0, forming cream colonies. Strain N.8 was able to grow on medium containing up to 9.0 % (w/v) NaCl with an optimum at 6.0 % (w/v) NaCl. The cellular membrane contained MK-7, and C16 : 0 (48.4 %), iso-C17 : 0 (19.4 %) and anteiso-C17 : 0 (14.6 %) were the major cellular fatty acids. The DNA G+C content was 40.5 mol%. Based on phenotypic characteristics, 16S rRNA gene sequences, DNA-DNA hybridization values and chemotaxonomic characteristics, strain N.8 T represents a novel species of the genus Aeribacillus, for which the name Aeribacillus composti sp. nov. is proposed. The type strain is N.8 T (=KCTC 33824 T =JCM 31580 T ).

  13. Phylogenetic analysis of the spirochete Borrelia microti, a potential agent of relapsing fever in Iran.

    PubMed

    Naddaf, Saied Reza; Ghazinezhad, Behnaz; Bahramali, Golnaz; Cutler, Sally Jane

    2012-09-01

    We report a role for Borrelia microti as a cause of relapsing fever in Iran supported by robust epidemiological evidence. The molecular identity of this spirochete and its relation with other relapsing fever borreliae have, until now, been poorly delineated. We analyzed an isolate of B. microti, obtained from Ornithodoros erraticus ticks, by sequencing four loci (16S rRNA, flaB, glpQ, intragenic spacer [IGS]) and comparing these sequences with those of other relapsing fever borreliae. Phylogenetic analysis using concatenated sequences of 16S rRNA, flaB, and glpQ grouped B. microti alongside three members of the African group, B. duttonii, B. recurrentis, and B. crocidurae, which are distinct from B. persica, the most prevalent established cause of tick-borne relapsing fever in Iran. The similarity values for 10 concatenated sequences totaling 2,437 nucleotides ranged from 92.11% to 99.84%, with the highest homologies being between B. duttonii and B. microti and between B. duttonii and B. recurrentis. Furthermore, the more discriminatory IGS sequence analysis corroborated the close similarity (97.76% to 99.56%) between B. microti and B. duttonii. These findings raise the possibility that both species may indeed be the same and further dispel the one-species, one-vector theory that has been the basis for classification of relapsing fever Borrelia for the last 100 years.

  14. Verrucosispora sonchi sp. nov., a novel endophytic actinobacterium isolated from the leaves of common sowthistle (Sonchus oleraceus L.).

    PubMed

    Ma, Zhaoxu; Zhao, Shanshan; Cao, Tingting; Liu, Chongxi; Huang, Ying; Gao, Yuhang; Yan, Kai; Xiang, Wensheng; Wang, Xiangjing

    2016-12-01

    A novel actinobacterium, designated strain NEAU-QY3T, was isolated from the leaves of Sonchus oleraceus L. and examined using a polyphasic taxonomic approach. The organism formed single spores with smooth surface on substrate mycelia. Phylogenetic analysis based on the 16S rRNA gene sequence indicated that the strain had a close association with the genus Verrucosispora and shared the highest sequence similarity with Verrucosispora qiuiae RtIII47T (99.17 %), an association that was supported by a bootstrap value of 94 % in the neighbour-joining tree and also recovered with the maximum-likelihood algorithm. The strain also showed high 16S rRNA gene sequence similarities to Xiangella phaseoli NEAU-J5T (98.78 %), Jishengella endophytica 202201T (98.51 %), Micromonospora eburnea LK2-10T (98.28 %), Verrucosispora lutea YIM 013T (98.23 %) and Salinispora pacifica CNR-114T (98.23 %). Furthermore, phylogenetic analysis based on the gyrB gene sequences supported the conclusion that strain NEAU-QY3T should be assigned to the genus Verrucosispora. However, the DNA-DNA hybridization relatedness values between strain NEAU-QY3T and V. qiuiae RtIII47T and V. lutea YIM 013T were below 70 %. With reference to phenotypic characteristics, phylogenetic data and DNA-DNA hybridization results, strain NEAU-QY3T was readily distinguished from its most closely related strains and classified as a new species, for which the name Verrucosispora sonchi sp. nov. is proposed. The type strain is NEAU-QY3T (=CGMCC 4.7312T=DSM 101530T).

  15. Genetic discovery in Xylella fastidiosa through sequence analysis of selected randomly amplified polymorphic DNAs.

    PubMed

    Chen, Jianchi; Civerolo, Edwin L; Jarret, Robert L; Van Sluys, Marie-Anne; de Oliveira, Mariana C

    2005-02-01

    Xylella fastidiosa causes many important plant diseases including Pierce's disease (PD) in grape and almond leaf scorch disease (ALSD). DNA-based methodologies, such as randomly amplified polymorphic DNA (RAPD) analysis, have been playing key roles in genetic information collection of the bacterium. This study further analyzed the nucleotide sequences of selected RAPDs from X. fastidiosa strains in conjunction with the available genome sequence databases and unveiled several previously unknown novel genetic traits. These include a sequence highly similar to those in the phage family of Podoviridae. Genome comparisons among X. fastidiosa strains suggested that the "phage" is currently active. Two other RAPDs were also related to horizontal gene transfer: one was part of a broadly distributed cryptic plasmid and the other was associated with conjugal transfer. One RAPD inferred a genomic rearrangement event among X. fastidiosa PD strains and another identified a single nucleotide polymorphism of evolutionary value.

  16. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    PubMed Central

    2012-01-01

    Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies) was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST) data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts) longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10−09 and 1.1 × 10−09) is an order of magnitude smaller than values reported for angiosperm herbs. However, if one takes generation time into account, most of this difference disappears. The estimates of the dN/dS ratio (non-synonymous over synonymous divergence) reported here are in general much lower than 1 and only a few genes showed a ratio larger than 1. PMID:23122049

  17. Isoelectronic studies of the 5s/sup 2/ /sup 1/S/sub 0/-5s5p/sup 1,3/P/sub J/ intervals in the Cd sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Curtis, L.J.

    1986-02-01

    The 5s/sup 2/ /sup 1/S/sub 0/-5s5p/sup 1,3/P/sub J/ energy intervals in the Cd isoelectronic sequence have been investigated through a semiempirical systematization of recent measurements and through the performance of ab initio multiconfiguration Dirac-Fock calculations. Screening-parameter reductions of the spin-orbit and exchange energies both for the observed data and for the theoretically computed values establish the existence of empirical linearities similar to those exploited earlier for the Be, Mg, and Zn sequences. This permits extrapolative isoelectronic predictions of the relative energies of the 5s5p levels, which can be connected to 5s/sup 2/ using intersinglet intervals obtained from empirically corrected abmore » initio calculations. These linearities have also been examined homologously for the Zn, Cd, and Hg sequences, and common relationships have been found that accurately describe all three of these sequences.« less

  18. Serratia myotis sp. nov. and Serratia vespertilionis sp. nov., isolated from bats hibernating in caves.

    PubMed

    García-Fraile, P; Chudíčková, M; Benada, O; Pikula, J; Kolařík, M

    2015-01-01

    During the study of bacteria associated with bats affected by white-nose syndrome hibernating in caves in the Czech Republic, we isolated two facultatively anaerobic, Gram-stain-negative bacteria, designated strains 12(T) and 52(T). Strains 12(T) and 52(T) were motile, rod-like bacteria (0.5-0.6 µm in diameter; 1-1.3 µm long), with optimal growth at 20-35 °C and pH 6-8. On the basis of the almost complete sequence of their 16S rRNA genes they should be classified within the genus Serratia; the closest relatives to strains 12(T) and 52(T) were Serratia quinivorans DSM 4597(T) (99.5 % similarity in 16S rRNA gene sequences) and Serratia ficaria DSM 4569(T) (99.5% similarity in 16S rRNA gene sequences), respectively. DNA-DNA relatedness between strain 12(T) and S. quinivorans DSM 4597(T) was only 37.1% and between strain 52(T) and S. ficaria DSM 4569(T) was only 56.2%. Both values are far below the 70% threshold value for species delineation. In view of these data, we propose the inclusion of the two isolates in the genus Serratia as representatives of Serratia myotis sp. nov. (type strain 12(T) =CECT 8594(T) =DSM 28726(T)) and Serratia vespertilionis sp. nov. (type strain 52(T) =CECT 8595(T) =DSM 28727(T)). © 2015 IUMS.

  19. Shewanella amazonensis sp. nov., a novel metal-reducing facultative anaerobe from Amazonian shelf muds

    NASA Technical Reports Server (NTRS)

    Venkateswaran, K.; Dollhopf, M. E.; Aller, R.; Stackebrandt, E.; Nealson, K. H.

    1998-01-01

    A new bacterial species belonging to the genus Shewanella is described on the basis of phenotypic characterization and sequence analysis of its 16S rRNA-encoding and gyrase B (gyrB) genes. This organism, isolated from shallow-water marine sediments derived from the Amazon River delta, is a Gram-negative, motile, polarly flagellated, facultatively anaerobic, rod-shaped eubacterium and has a G&C content of 51.7 mol%. Strain SB2BT is exceptionally active in the anaerobic reduction of iron, manganese and sulfur compounds. SB2BT grows optimally at 35 degrees C, with 1-3% NaCl and over a pH range of 7-8. Analysis of the 16S rDNA sequence revealed a clear affiliation between strain SB2BT and members of the gamma subclass of the class Proteobacteria. High similarity values were found with certain members of the genus Shewanella, especially with Shewanella putrefaciens, and this was supported by cellular fatty acid profiles and phenotypic characterization. DNA-DNA hybridization between strain SB2BT and its phylogenetically closest relatives revealed low similarity values (24.6-42.7%) which indicated species status for strain SB2BT. That SB2BT represents a distinct bacterial species within the genus Shewanella is also supported by gyrB sequence analysis. Considering the source of the isolate, the name Shewanella amazonensis sp. nov. is proposed and strain SB2BT (= ATCC 700329T) is designated as the type strain.

  20. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  1. A singular value decomposition approach for improved taxonomic classification of biological sequences

    PubMed Central

    2011-01-01

    Background Singular value decomposition (SVD) is a powerful technique for information retrieval; it helps uncover relationships between elements that are not prima facie related. SVD was initially developed to reduce the time needed for information retrieval and analysis of very large data sets in the complex internet environment. Since information retrieval from large-scale genome and proteome data sets has a similar level of complexity, SVD-based methods could also facilitate data analysis in this research area. Results We found that SVD applied to amino acid sequences demonstrates relationships and provides a basis for producing clusters and cladograms, demonstrating evolutionary relatedness of species that correlates well with Linnaean taxonomy. The choice of a reasonable number of singular values is crucial for SVD-based studies. We found that fewer singular values are needed to produce biologically significant clusters when SVD is employed. Subsequently, we developed a method to determine the lowest number of singular values and fewest clusters needed to guarantee biological significance; this system was developed and validated by comparison with Linnaean taxonomic classification. Conclusions By using SVD, we can reduce uncertainty concerning the appropriate rank value necessary to perform accurate information retrieval analyses. In tests, clusters that we developed with SVD perfectly matched what was expected based on Linnaean taxonomy. PMID:22369633

  2. Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

    PubMed

    King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

    2014-01-01

    Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

  3. Construction of Pseudomolecule Sequences of the aus Rice Cultivar Kasalath for Comparative Genomics of Asian Cultivated Rice

    PubMed Central

    Sakai, Hiroaki; Kanamori, Hiroyuki; Arai-Kichise, Yuko; Shibata-Hatta, Mari; Ebana, Kaworu; Oono, Youko; Kurita, Kanako; Fujisawa, Hiroko; Katagiri, Satoshi; Mukai, Yoshiyuki; Hamada, Masao; Itoh, Takeshi; Matsumoto, Takashi; Katayose, Yuichi; Wakasa, Kyo; Yano, Masahiro; Wu, Jianzhong

    2014-01-01

    Having a deep genetic structure evolved during its domestication and adaptation, the Asian cultivated rice (Oryza sativa) displays considerable physiological and morphological variations. Here, we describe deep whole-genome sequencing of the aus rice cultivar Kasalath by using the advanced next-generation sequencing (NGS) technologies to gain a better understanding of the sequence and structural changes among highly differentiated cultivars. The de novo assembled Kasalath sequences represented 91.1% (330.55 Mb) of the genome and contained 35 139 expressed loci annotated by RNA-Seq analysis. We detected 2 787 250 single-nucleotide polymorphisms (SNPs) and 7393 large insertion/deletion (indel) sites (>100 bp) between Kasalath and Nipponbare, and 2 216 251 SNPs and 3780 large indels between Kasalath and 93-11. Extensive comparison of the gene contents among these cultivars revealed similar rates of gene gain and loss. We detected at least 7.39 Mb of inserted sequences and 40.75 Mb of unmapped sequences in the Kasalath genome in comparison with the Nipponbare reference genome. Mapping of the publicly available NGS short reads from 50 rice accessions proved the necessity and the value of using the Kasalath whole-genome sequence as an additional reference to capture the sequence polymorphisms that cannot be discovered by using the Nipponbare sequence alone. PMID:24578372

  4. Calculation of vitrinite reflectance from thermal histories: A comparison of some methods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morrow, D.W.; Issler, D.R.

    1993-04-01

    Vitrinite reflectance values (%R[sub o]) calculated from commonly used methods are compared with respect to time invariant temperatures and constant heating rates. Two monofunctional methods, one involving a time-temperature index to vitrinite reflectance correlation (TTI-%R[sub o]) to depth correlation, yield vitrinite reflectance values that are similar to those calculated by recently published Arrhenius-based methods, such as EASY%R[sub o]. The approximate agreement between these methods supports the perception that the EASY%R[sub o] algorithm is the most accurate method for the prediction of vitrinite reflectances throughout the range of organic maturity normally encountered. However, calibration of these methods against vitrinite reflectance datamore » from two basin sequences with well-documented geologic histories indicates that, although the EASY%R[sub o] method has wide applicability, it slightly overestimates vitrinite reflectances in strata of low to medium maturity up to a %R[sub o] value of 0.9%. The two monofunctional methods may be more accurate for prediction of vitrinite reflectances in similar sequences of low maturity. An older, but previously widely accepted TTI-%R[sub O] correlation consistently overestimates vitrinite reflectances with respect to other methods. Underestimation of paleogeothermal gradients in the original calibration of time-temperature history to vitrinite reflectance may have introduced a systematic bias to the TTI-%R[sub o] correlation used in this method. Also, incorporation of TAI (thermal alteration index) data and its conversion to %R[sub o]-equivalent values may have introduced inaccuracies. 36 refs., 7 figs.« less

  5. The flounder organic anion transporter fOat has sequence, function, and substrate specificity similarity to both mammalian Oat1 and Oat3

    PubMed Central

    Aslamkhan, Amy G.; Thompson, Deborah M.; Perry, Jennifer L.; Bleasby, Kelly; Wolff, Natascha A.; Barros, Scott; Miller, David S.; Pritchard, John B.

    2007-01-01

    The flounder renal organic anion transporter (fOat) has substantial sequence homology to mammalian basolateral organic anion transporter orthologs (OAT1/Oat1 and OAT3/Oat3), suggesting that fOat may have functional properties of both mammalian forms. We therefore compared uptake of various substrates by rat Oat1 and Oat3 and human OAT1 and OAT3 with the fOat clone expressed in Xenopus oocytes. These data confirm that estrone sulfate is an excellent substrate for mammalian OAT3/Oat3 transporters but not for OAT1/Oat1 transporters. In contrast, 2,4-dichlorophenoxyacetic acid and adefovir are better transported by mammalian OAT1/Oat1 than by the OAT3/Oat3 clones. All three substrates were well transported by fOat-expressing Xenopus oocytes. fOat Km values were comparable to those obtained for mammalian OAT/Oat1/3 clones. We also characterized the ability of these substrates to inhibit uptake of the fluorescent substrate fluorescein in intact teleost proximal tubules isolated from the winter flounder (Pseudopleuronectes americanus) and killifish (Fundulus heteroclitus). The rank order of the IC50 values for inhibition of cellular fluorescein accumulation was similar to that for the Km values obtained in fOat-expressing oocytes, suggesting that fOat may be the primary teleost renal basolateral Oat. Assessment of the zebrafish (Danio rerio) genome indicated the presence of a single Oat (zfOat) with similarity to both mammalian OAT1/Oat1 and OAT3/Oat3. The puffer fish (Takifugu rubripes) also has an Oat (pfOat) similar to mammalian OAT1/Oat1 and OAT3/Oat3 members. Furthermore, phylogenetic analyses argue that the teleost Oat1/3-like genes diverged from a common ancestral gene in advance of the divergence of the mammalian OAT1/Oat1, OAT3/Oat3, and, possibly, Oat6 genes. PMID:16857889

  6. Zn-metalloprotease sequences in extremophiles

    NASA Astrophysics Data System (ADS)

    Holden, T.; Dehipawala, S.; Golebiewska, U.; Cheung, E.; Tremberger, G., Jr.; Williams, E.; Schneider, P.; Gadura, N.; Lieberman, D.; Cheung, T.

    2010-09-01

    The Zn-metalloprotease family contains conserved amino acid structures such that the nucleotide fluctuation at the DNA level would exhibit correlated randomness as described by fractal dimension. A nucleotide sequence fractal dimension can be calculated from a numerical series consisting of the atomic numbers of each nucleotide. The structure's vibration modes can also be studied using a Gaussian Network Model. The vibration measure and fractal dimension values form a two-dimensional plot with a standard vector metric that can be used for comparison of structures. The preference for amino acid usage in extremophiles may suppress nucleotide fluctuations that could be analyzed in terms of fractal dimension and Shannon entropy. A protein level cold adaptation study of the thermolysin Zn-metalloprotease family using molecular dynamics simulation was reported recently and our results show that the associated nucleotide fluctuation suppression is consistent with a regression pattern generated from the sequences's fractal dimension and entropy values (R-square { 0.98, N =5). It was observed that cold adaptation selected for high entropy and low fractal dimension values. Extension to the Archaemetzincin M54 family in extremophiles reveals a similar regression pattern (R-square = 0.98, N = 6). It was observed that the metalloprotease sequences of extremely halophilic organisms possess high fractal dimension and low entropy values as compared with non-halophiles. The zinc atom is usually bonded to the histidine residue, which shows limited levels of vibration in the Gaussian Network Model. The variability of the fractal dimension and entropy for a given protein structure suggests that extremophiles would have evolved after mesophiles, consistent with the bias usage of non-prebiotic amino acids by extremophiles. It may be argued that extremophiles have the capacity to offer extinction protection during drastic changes in astrobiological environments.

  7. Strategies for high-altitude adaptation revealed from high-quality draft genome of non-violacein producing Janthinobacterium lividum ERGS5:01.

    PubMed

    Kumar, Rakshak; Acharya, Vishal; Singh, Dharam; Kumar, Sanjay

    2018-01-01

    A light pink coloured bacterial strain ERGS5:01 isolated from glacial stream water of Sikkim Himalaya was affiliated to Janthinobacterium lividum based on 16S rRNA gene sequence identity and phylogenetic clustering. Whole genome sequencing was performed for the strain to confirm its taxonomy as it lacked the typical violet pigmentation of the genus and also to decipher its survival strategy at the aquatic ecosystem of high elevation. The PacBio RSII sequencing generated genome of 5,168,928 bp with 4575 protein-coding genes and 118 RNA genes. Whole genome-based multilocus sequence analysis clustering, in silico DDH similarity value of 95.1% and, the ANI value of 99.25% established the identity of the strain ERGS5:01 (MCC 2953) as a non-violacein producing J. lividum . The genome comparisons across genus Janthinobacterium revealed an open pan-genome with the scope of the addition of new orthologous cluster to complete the genomic inventory. The genomic insight provided the genetic basis of freezing and frequent freeze-thaw cycle tolerance and, for industrially important enzymes. Extended insight into the genome provided clues of crucial genes associated with adaptation in the harsh aquatic ecosystem of high altitude.

  8. Cloning and Sequence Analysis of Vibrio halioticoli Genes Encoding Three Types of Polyguluronate Lyase.

    PubMed

    Sugimura; Sawabe; Ezura

    2000-01-01

    The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.

  9. Compressed Sensing SEMAC: 8-fold Accelerated High Resolution Metal Artifact Reduction MRI of Cobalt-Chromium Knee Arthroplasty Implants.

    PubMed

    Fritz, Jan; Ahlawat, Shivani; Demehri, Shadpour; Thawait, Gaurav K; Raithel, Esther; Gilson, Wesley D; Nittka, Mathias

    2016-10-01

    The aim of this study was to prospectively test the hypothesis that a compressed sensing-based slice encoding for metal artifact correction (SEMAC) turbo spin echo (TSE) pulse sequence prototype facilitates high-resolution metal artifact reduction magnetic resonance imaging (MRI) of cobalt-chromium knee arthroplasty implants within acquisition times of less than 5 minutes, thereby yielding better image quality than high-bandwidth (BW) TSE of similar length and similar image quality than lengthier SEMAC standard of reference pulse sequences. This prospective study was approved by our institutional review board. Twenty asymptomatic subjects (12 men, 8 women; mean age, 56 years; age range, 44-82 years) with total knee arthroplasty implants underwent MRI of the knee using a commercially available, clinical 1.5 T MRI system. Two compressed sensing-accelerated SEMAC prototype pulse sequences with 8-fold undersampling and acquisition times of approximately 5 minutes each were compared with commercially available high-BW and SEMAC pulse sequences with acquisition times of approximately 5 minutes and 11 minutes, respectively. For each pulse sequence type, sagittal intermediate-weighted (TR, 3750-4120 milliseconds; TE, 26-28 milliseconds; voxel size, 0.5 × 0.5 × 3 mm) and short tau inversion recovery (TR, 4010 milliseconds; TE, 5.2-7.5 milliseconds; voxel size, 0.8 × 0.8 × 4 mm) were acquired. Outcome variables included image quality, display of the bone-implant interfaces and pertinent knee structures, artifact size, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR). Statistical analysis included Friedman, repeated measures analysis of variances, and Cohen weighted k tests. Bonferroni-corrected P values of 0.005 and less were considered statistically significant. Image quality, bone-implant interfaces, anatomic structures, artifact size, SNR, and CNR parameters were statistically similar between the compressed sensing-accelerated SEMAC prototype and SEMAC commercial pulse sequences. There was mild blur on images of both SEMAC sequences when compared with high-BW images (P < 0.001), which however did not impair the assessment of knee structures. Metal artifact reduction and visibility of central knee structures and bone-implant interfaces were good to very good and significantly better on both types of SEMAC than on high-BW images (P < 0.004). All 3 pulse sequences showed peripheral structures similarly well. The implant artifact size was 46% to 51% larger on high-BW images when compared with both types of SEMAC images (P < 0.0001). Signal-to-noise ratios and CNRs of fat tissue, tendon tissue, muscle tissue, and fluid were statistically similar on intermediate-weighted MR images of all 3 pulse sequence types. On short tau inversion recovery images, the SNRs of tendon tissue and the CNRs of fat and fluid, fluid and muscle, as well as fluid and tendon were significantly higher on SEMAC and compressed sensing SEMAC images (P < 0.005, respectively). We accept the hypothesis that prospective compressed sensing acceleration of SEMAC is feasible for high-quality metal artifact reduction MRI of cobalt-chromium knee arthroplasty implants in less than 5 minutes and yields better quality than high-BW TSE and similarly high quality than lengthier SEMAC pulse sequences.

  10. Enterobacter muelleri sp. nov., isolated from the rhizosphere of Zea mays.

    PubMed

    Kämpfer, Peter; McInroy, John A; Glaeser, Stefanie P

    2015-11-01

    A beige-pigmented, oxidase-negative bacterial strain (JM-458T), isolated from a rhizosphere sample, was studied using a polyphasic taxonomic approach. Cells of the isolate were rod-shaped and stained Gram-negative. A comparison of the 16S rRNA gene sequence of strain JM-458T with sequences of the type strains of closely related species of the genus Enterobacter showed that it shared highest sequence similarity with Enterobacter mori (98.7 %), Enterobacter hormaechei (98.3 %), Enterobacter cloacae subsp. dissolvens, Enterobacter ludwigii and Enterobacter asburiae (all 98.2 %). 16S rRNA gene sequence similarities to all other Enterobacter species were below 98 %. Multilocus sequence analysis based on concatenated partial rpoB, gyrB, infB and atpD gene sequences showed a clear distinction of strain JM-458T from its closest related type strains. The fatty acid profile of the strain consisted of C16 : 0, C17 : 0 cyclo, iso-C15 : 0 2-OH/C16 : 1ω7c and C18 : 1ω7c as major components. DNA-DNA hybridizations between strain JM-458T and the type strains of E. mori, E. hormaechei and E. ludwigii resulted in relatedness values of 29 % (reciprocal 25 %), 24 % (reciprocal 43 %) and 16 % (reciprocal 17 %), respectively. DNA-DNA hybridization results together with multilocus sequence analysis results and differential biochemical and chemotaxonomic properties showed that strain JM-458T represents a novel species of the genus Enterobacter, for which the name Enterobacter muelleri sp. nov. is proposed. The type strain is JM-458T ( = DSM 29346T = CIP 110826T = LMG 28480T = CCM 8546T).

  11. Inhibition of trypanosomal cysteine proteinases by their propeptides.

    PubMed

    Lalmanach, G; Lecaille, F; Chagas, J R; Authié, E; Scharfstein, J; Juliano, M A; Gauthier, F

    1998-09-25

    The ability of the prodomains of trypanosomal cysteine proteinases to inhibit their active form was studied using a set of 23 overlapping 15-mer peptides covering the whole prosequence of congopain, the major cysteine proteinase of Trypanosoma congolense. Three consecutive peptides with a common 5-mer sequence YHNGA were competitive inhibitors of congopain. A shorter synthetic peptide consisting of this 5-mer sequence flanked by two Ala residues (AYHNGAA) also inhibited purified congopain. No residue critical for inhibition was identified in this sequence, but a significant improvement in Ki value was obtained upon N-terminal elongation. Procongopain-derived peptides did not inhibit lysosomal cathepsins B and L but did inhibit native cruzipain (from Dm28c clone epimastigotes), the major cysteine proteinase of Trypanosoma cruzi, the proregion of which also contains the sequence YHNGA. The positioning of the YHNGA inhibitory sequence within the prosegment of trypanosomal proteinases is similar to that covering the active site in the prosegment of cysteine proteinases, the three-dimensional structure of which has been resolved. This strongly suggests that trypanosomal proteinases, despite their long C-terminal extension, have a prosegment that folds similarly to that in related mammal and plant cysteine proteinases, resulting in reverse binding within the active site. Such reverse binding could also occur for short procongopain-derived inhibitory peptides, based on their resistance to proteolysis and their ability to retain inhibitory activity after prolonged incubation. In contrast, homologous peptides in related cysteine proteinases did not inhibit trypanosomal proteinases and were rapidly cleaved by these enzymes.

  12. Abiotic Stress Resistance, a Novel Moonlighting Function of Ribosomal Protein RPL44 in the Halophilic Fungus Aspergillus glaucus

    PubMed Central

    Liu, Xiao-Dan; Xie, Lixia; Wei, Yi; Zhou, Xiaoyang; Jia, Baolei; Liu, Jinliang

    2014-01-01

    Ribosomal proteins are highly conserved components of basal cellular organelles, primarily involved in the translation of mRNA leading to protein synthesis. However, certain ribosomal proteins moonlight in the development and differentiation of organisms. In this study, the ribosomal protein L44 (RPL44), associated with salt resistance, was screened from the halophilic fungus Aspergillus glaucus (AgRPL44), and its activity was investigated in Saccharomyces cerevisiae and Nicotiana tabacum. Sequence alignment revealed that AgRPL44 is one of the proteins of the large ribosomal subunit 60S. Expression of AgRPL44 was upregulated via treatment with salt, sorbitol, or heavy metals to demonstrate its response to osmotic stress. A homologous sequence from the model fungus Magnaporthe oryzae, MoRPL44, was cloned and compared with AgRPL44 in a yeast expression system. The results indicated that yeast cells with overexpressed AgRPL44 were more resistant to salt, drought, and heavy metals than were yeast cells expressing MoRPL44 at a similar level of stress. When AgRPL44 was introduced into M. oryzae, the transformants displayed obviously enhanced tolerance to salt and drought, indicating the potential value of AgRPL44 for genetic applications. To verify the value of its application in plants, tobacco was transformed with AgRPL44, and the results were similar. Taken together, we conclude that AgRPL44 supports abiotic stress resistance and may have value for genetic application. PMID:24814782

  13. Comparative sequence analysis of Mycobacterium leprae and the new leprosy-causing Mycobacterium lepromatosis.

    PubMed

    Han, Xiang Y; Sizer, Kurt C; Thompson, Erika J; Kabanja, Juma; Li, Jun; Hu, Peter; Gómez-Valero, Laura; Silva, Francisco J

    2009-10-01

    Mycobacterium lepromatosis is a newly discovered leprosy-causing organism. Preliminary phylogenetic analysis of its 16S rRNA gene and a few other gene segments revealed significant divergence from Mycobacterium leprae, a well-known cause of leprosy, that justifies the status of M. lepromatosis as a new species. In this study we analyzed the sequences of 20 genes and pseudogenes (22,814 nucleotides). Overall, the level of matching of these sequences with M. leprae sequences was 90.9%, which substantiated the species-level difference; the levels of matching for the 16S rRNA genes and 14 protein-encoding genes were 98.0% and 93.1%, respectively, but the level of matching for five pseudogenes was only 79.1%. Five conserved protein-encoding genes were selected to construct phylogenetic trees and to calculate the numbers of synonymous substitutions (dS values) and nonsynonymous substitutions (dN values) in the two species. Robust phylogenetic trees constructed using concatenated alignment of these genes placed M. lepromatosis and M. leprae in a tight cluster with long terminal branches, implying that the divergence occurred long ago. The dS and dN values were also much higher than those for other closest pairs of mycobacteria. The dS values were 14 to 28% of the dS values for M. leprae and Mycobacterium tuberculosis, a more divergent pair of species. These results thus indicate that M. lepromatosis and M. leprae diverged approximately 10 million years ago. The M. lepromatosis pseudogenes analyzed that were also pseudogenes in M. leprae showed nearly neutral evolution, and their relative ages were similar to those of M. leprae pseudogenes, suggesting that they were pseudogenes before divergence. Taken together, the results described above indicate that M. lepromatosis and M. leprae diverged from a common ancestor after the massive gene inactivation event described previously for M. leprae.

  14. Fracture propagation through a layered shale and limestone sequence at Nash Point, South Wales: Implications on the development of fracture networks in layered sequences

    NASA Astrophysics Data System (ADS)

    Forbes Inskip, N.; Meredith, P. G.; Gudmundsson, A.

    2017-12-01

    While considerable effort has been expended on the study of fracture propagation in rocks in recent years, our understanding of how fractures propagate through sedimentary rocks composed of layers with different mechanical and elastic properties remains poor. Yet the mechanical layering is a key parameter controlling the propagation of fractures in sedimentary sequences. Here we report measurements of the contrasting properties of the Lower Lias at Nash Point, South Wales, which comprises a sequence of interbedded shale and limestone layers, and how those properties influence fracture propagation. The static Young's modulus (Estat) of both rock types has been measured parallel and normal to bedding. The shale is highly anisotropic, with Estat varying from 2.4 GPa, in the bedding-normal orientation, to 7.9 GPa, in the bedding-parallel orientation, yielding an anisotropy of 107%. By contrast the limestone has a very low anisotropy of 8%, with Estat values varying from 28.5 GPa, in the bedding-normal orientation, to 26.3 GPa in the bedding-parallel orientation. It follows that for a vertical fracture propagating in this sequence the modulus contrast is by a factor of about 12. This is important because the contrast in elastic properties is a key factor in controlling whether fractures arrest, deflect, or propagate across interfaces between layers in a sequence. Preliminary numerical modelling results (using a finite element modelling software) of induced fractures at Nash Point demonstrate a rotation of the maximum principal compressive stress across interfaces but also the concentration of tensile stress within the more competent (high Estat) limestone layers. The tensile strength (σT), using the Brazil-disk technique, and fracture toughness (KIc), using the semi-circular bend methodology, of both rock types have been measured. Measurements were made in the three principal orientations relative to bedding, Arrester, Divider, and Short-Transverse, and also at 15° intervals between these planes. Again, values for the shale show a high degree of anisotropy; with similar values in the Arrester and Divider orientations, but much lower values in the Short-Transverse orientation. σT and KIc values for the limestone are considerably higher than those for the shale and exhibit no significant anisotropy.

  15. Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

    PubMed

    Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin

    2012-03-01

    The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.

  16. Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae).

    PubMed

    Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J; Sathyanarayana, N; Egan, Ashley N

    2016-06-30

    Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean.

  17. Bacillus infantis sp. nov. and Bacillus idriensis sp. nov., isolated from a patient with neonatal sepsis.

    PubMed

    Ko, Kwan Soo; Oh, Won Sup; Lee, Mi Young; Lee, Jang Ho; Lee, Hyuck; Peck, Kyong Ran; Lee, Nam Yong; Song, Jae-Hoon

    2006-11-01

    Two Gram-positive bacilli, designated as strains SMC 4352-1T and SMC 4352-2T, were isolated sequentially from the blood of a newborn child with sepsis. They could not be identified by using conventional clinical microbiological methods. 16S rRNA gene sequencing and phylogenetic analysis revealed that both strains belonged to the genus Bacillus but clearly diverged from known Bacillus species. Strain SMC 4352-1T and strain SMC 4352-2T were found to be closely related to Bacillus firmus NCIMB 9366T (98.2% sequence similarity) and Bacillus cibi JG-30T (97.1% sequence similarity), respectively. They also displayed low DNA-DNA reassociation values (less than 40%) with respect to the most closely related Bacillus species. On the basis of their polyphasic characteristics, strain SMC 4352-1T and strain SMC 4352-2T represent two novel species of the genus Bacillus, for which the names Bacillus infantis sp. nov. (type strain SMC 4352-1T=KCCM 90025T=JCM 13438T) and Bacillus idriensis sp. nov. (type strain SMC 4352-2T=KCCM 90024T=JCM 13437T) are proposed.

  18. Streptococcus moroccensis sp. nov. and Streptococcus rifensis sp. nov., isolated from raw camel milk.

    PubMed

    Kadri, Zaina; Amar, Mohamed; Ouadghiri, Mouna; Cnockaert, Margo; Aerts, Maarten; El Farricha, Omar; Vandamme, Peter

    2014-07-01

    Two catalase- and oxidase-negative Streptococcus-like strains, LMG 27682(T) and LMG 27684(T), were isolated from raw camel milk in Morocco. Comparative 16S rRNA gene sequencing assigned these bacteria to the genus Streptococcus with Streptococcus rupicaprae 2777-2-07(T) as their closest phylogenetic neighbour (95.9% and 95.7% similarity, respectively). 16S rRNA gene sequence similarity between the two strains was 96.7%. Although strains LMG 27682(T) and LMG 27684(T) shared a DNA-DNA hybridization value that corresponded to the threshold level for species delineation (68%), the two strains could be distinguished by multiple biochemical tests, sequence analysis of the phenylalanyl-tRNA synthase (pheS), RNA polymerase (rpoA) and ATP synthase (atpA) genes and by their MALDI-TOF MS profiles. On the basis of these considerable phenotypic and genotypic differences, we propose to classify both strains as novel species of the genus Streptococcus, for which the names Streptococcus moroccensis sp. nov. (type strain, LMG 27682(T)  = CCMM B831(T)) and Streptococcus rifensis sp. nov. (type strain, LMG 27684(T)  = CCMM B833(T)) are proposed. © 2014 IUMS.

  19. Pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of Plasmodium vivax in human patients.

    PubMed

    Merino, Emilio F; Fernandez-Becerra, Carmen; Madeira, Alda M B N; Machado, Ariane L; Durham, Alan; Gruber, Arthur; Hall, Neil; del Portillo, Hernando A

    2003-07-21

    Plasmodium vivax is the most widely distributed human malaria, responsible for 70-80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected. A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10(-30) was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them. These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.

  20. Is the phonological similarity effect in working memory due to proactive interference?

    PubMed

    Baddeley, Alan D; Hitch, Graham J; Quinlan, Philip T

    2018-04-12

    Immediate serial recall of verbal material is highly sensitive to impairment attributable to phonological similarity. Although this has traditionally been interpreted as a within-sequence similarity effect, Engle (2007) proposed an interpretation based on interference from prior sequences, a phenomenon analogous to that found in the Peterson short-term memory (STM) task. We use the method of serial reconstruction to test this in an experiment contrasting the standard paradigm in which successive sequences are drawn from the same set of phonologically similar or dissimilar words and one in which the vowel sound on which similarity is based is switched from trial to trial, a manipulation analogous to that producing release from PI in the Peterson task. A substantial similarity effect occurs under both conditions although there is a small advantage from switching across similar sequences. There is, however, no evidence for the suggestion that the similarity effect will be absent from the very first sequence tested. Our results support the within-sequence similarity rather than a between-list PI interpretation. Reasons for the contrast with the classic Peterson short-term forgetting task are briefly discussed. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  1. Molecular analysis of the split cox1 gene from the Basidiomycota Agrocybe aegerita: relationship of its introns with homologous Ascomycota introns and divergence levels from common ancestral copies.

    PubMed

    Gonzalez, P; Barroso, G; Labarère, J

    1998-10-05

    The Basidiomycota Agrocybe aegerita (Aa) mitochondrial cox1 gene (6790 nucleotides), encoding a protein of 527aa (58377Da), is split by four large subgroup IB introns possessing site-specific endonucleases assumed to be involved in intron mobility. When compared to other fungal COX1 proteins, the Aa protein is closely related to the COX1 one of the Basidiomycota Schizophyllum commune (Sc). This clade reveals a relationship with the studied Ascomycota ones, with the exception of Schizosaccharomyces pombe (Sp) which ranges in an out-group position compared with both higher fungi divisions. When comparison is extended to other kingdoms, fungal COX1 sequences are found to be more related to algae and plant ones (more than 57.5% aa similarity) than to animal sequences (53.6% aa similarity), contrasting with the previously established close relationship between fungi and animals, based on comparisons of nuclear genes. The four Aa cox1 introns are homologous to Ascomycota or algae cox1 introns sharing the same location within the exonic sequences. The percentages of identity of the intronic nucleotide sequences suggest a possible acquisition by lateral transfers of ancestral copies or of their derived sequences. These identities extend over the whole intronic sequences, arguing in favor of a transfer of the complete intron rather than a transfer limited to the encoded ORF. The intron i4 shares 74% of identity, at the nucleotidic level, with the Podospora anserina (Pa) intron i14, and up to 90.5% of aa similarity between the encoded proteins, i.e. the highest values reported to date between introns of two phylogenetically distant species. This low divergence argues for a recent lateral transfer between the two species. On the contrary, the low sequence identities (below 36%) observed between Aa i1 and the homologous Sp i1 or Prototheca wickeramii (Pw) i1 suggest a long evolution time after the separation of these sequences. The introns i2 and i3 possessed intermediate percentages of identity with their homologous Ascomycota introns. This is the first report of the complete nucleotide sequence and molecular organization of a mitochondrial cox1 gene of any member of the Basidiomycota division.

  2. The Alveolate Perkinsus marinus: Biological Insights from EST Gene Discovery

    PubMed Central

    2010-01-01

    Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ≤ 1e-5). Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict plastid. Further, although P. marinus sequences display significant similarity to those from both apicomplexans and dinoflagellates, the presence of trans-spliced transcripts confirms the previously established affinities with the latter. The EST analysis reported herein, together with the recently completed sequence of the P. marinus genome and the development of transfection methodology, should result in improved intervention strategies against dermo disease. PMID:20374649

  3. Slice profile effects in 2D slice-selective MRI of hyperpolarized nuclei.

    PubMed

    Deppe, Martin H; Teh, Kevin; Parra-Robles, Juan; Lee, Kuan J; Wild, Jim M

    2010-02-01

    This work explores slice profile effects in 2D slice-selective gradient-echo MRI of hyperpolarized nuclei. Two different sequences were investigated: a Spoiled Gradient Echo sequence with variable flip angle (SPGR-VFA) and a balanced Steady-State Free Precession (SSFP) sequence. It is shown that in SPGR-VFA the distribution of flip angles across the slice present in any realistically shaped radiofrequency (RF) pulse leads to large excess signal from the slice edges in later RF views, which results in an undesired non-constant total transverse magnetization, potentially exceeding the initial value by almost 300% for the last RF pulse. A method to reduce this unwanted effect is demonstrated, based on dynamic scaling of the slice selection gradient. SSFP sequences with small to moderate flip angles (<40 degrees ) are also shown to preserve the slice profile better than the most commonly used SPGR sequence with constant flip angle (SPGR-CFA). For higher flip angles, the slice profile in SSFP evolves in a manner similar to SPGR-CFA, with depletion of polarization in the center of the slice. Copyright 2009 Elsevier Inc. All rights reserved.

  4. Mitochondrial DNA variation and phylogenetic relationships among five tuna species based on sequencing of D-loop region.

    PubMed

    Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan

    2016-05-01

    In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species.

  5. The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

    PubMed

    Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes

    2018-05-28

    The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.

  6. DsaV methyltransferase and its isoschizomers contain a conserved segment that is similar to the segment in Hhai methyltransferase that is in contact with DNA bases.

    PubMed Central

    Gopal, J; Yebra, M J; Bhagwat, A S

    1994-01-01

    The methyltransferase (MTase) in the DsaV restriction--modification system methylates within 5'-CCNGG sequences. We have cloned the gene for this MTase and determined its sequence. The predicted sequence of the MTase protein contains sequence motifs conserved among all cytosine-5 MTases and is most similar to other MTases that methylate CCNGG sequences, namely M.ScrFI and M.SsoII. All three MTases methylate the internal cytosine within their recognition sequence. The 'variable' region within the three enzymes that methylate CCNGG can be aligned with the sequences of two enzymes that methylate CCWGG sequences. Remarkably, two segments within this region contain significant similarity with the region of M.HhaI that is known to contact DNA bases. These alignments suggest that many cytosine-5 MTases are likely to interact with DNA using a similar structural framework. Images PMID:7971279

  7. Novel methodologies for spectral classification of exon and intron sequences

    NASA Astrophysics Data System (ADS)

    Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

    2012-12-01

    Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.

  8. Reassessment of the taxonomic position of Burkholderia andropogonis and description of Robbsia andropogonis gen. nov., comb. nov.

    PubMed

    Lopes-Santos, Lucilene; Castro, Daniel Bedo Assumpção; Ferreira-Tonin, Mariana; Corrêa, Daniele Bussioli Alves; Weir, Bevan Simon; Park, Duckchul; Ottoboni, Laura Maria Mariscal; Neto, Júlio Rodrigues; Destéfano, Suzete Aparecida Lanza

    2017-06-01

    The phylogenetic classification of the species Burkholderia andropogonis within the Burkholderia genus was reassessed using 16S rRNA gene phylogenetic analysis and multilocus sequence analysis (MLSA). Both phylogenetic trees revealed two main groups, named A and B, strongly supported by high bootstrap values (100%). Group A encompassed all of the Burkholderia species complex, whi.le Group B only comprised B. andropogonis species, with low percentage similarities with other species of the genus, from 92 to 95% for 16S rRNA gene sequences and 83% for conserved gene sequences. Average nucleotide identity (ANI), tetranucleotide signature frequency, and percentage of conserved proteins POCP analyses were also carried out, and in the three analyses B. andropogonis showed lower values when compared to the other Burkholderia species complex, near 71% for ANI, from 0.484 to 0.724 for tetranucleotide signature frequency, and around 50% for POCP, reinforcing the distance observed in the phylogenetic analyses. Our findings provide an important insight into the taxonomy of B. andropogonis. It is clear from the results that this bacterial species exhibits genotypic differences and represents a new genus described herein as Robbsia andropogonis gen. nov., comb. nov.

  9. Cretaceous-Tertiary boundary in the Antarctic: Climatic cooling precedes biotic crisis

    NASA Technical Reports Server (NTRS)

    Stott, Lowell D.; Kennett, James P.

    1988-01-01

    Stable isotopic investigations were conducted on calcareous microfossils across two deep sea Cretaceous-Tertiary boundary sequences on Maud Rise, Weddell Sea, Antarctica. The boundary is taken at the level of massive extinctions in calcareous planktonic microfossils, and coincides with a sharp lithologic change from pure calcareous ooze to calcareous ooze with a large volcanic clay component. The uppermost Maestrichtian is marked by a long-term decrease in delta value of 0 to 18 which spans most of the lower and middle A. mayaroensis Zone and represents a warming trend which culminated in surface water temperatures of about 16 C. At approximately 3 meters below the K-T boundary this warming trend terminates abruptly and benthic and planktonic isotopic records exhibit a rapid increase in delta value of 0 to 18 that continues up to the K-T boundary. The trend towards cooler surface water temperatures stops abruptly at the K-T boundary and delta value of 0 to 18 values remain relatively stable through the Paleocene. Comparison of the Antarctic sequence with the previously documented deep sea records in the South Atlantic reveal shifts of similar magnitude in the latest Maestrichtian. It is indicated that the Southern Ocean underwent the most significant, and apparently permanent, climatic change. The latest Cretaceous oxygen isotopic shift recorded at Maud Rise and other deep sea sites is similar in magnitude to large positive delta valve of 0 to 18 shifts in the middle Eocene, at the Eocene/Oligocene boundary and in the middle Miocene that marked large scale climatic transitions which ultimately lead to cryospheric development of the Antarctic. The climatic shift at the end of the Cretaceous represents one of the most significant climatic transitions recorded in the latest Phanerozoic and had a profound effect on global climate as well as oceanic circulation.

  10. Winnowing sequences from a database search.

    PubMed

    Berman, P; Zhang, Z; Wolf, Y I; Koonin, E V; Miller, W

    2000-01-01

    In database searches for sequence similarity, matches to a distinct sequence region (e.g., protein domain) are frequently obscured by numerous matches to another region of the same sequence. In order to cope with this problem, algorithms are developed to discard redundant matches. One model for this problem begins with a list of intervals, each with an associated score; each interval gives the range of positions in the query sequence that align to a database sequence, and the score is that of the alignment. If interval I is contained in interval J, and I's score is less than J's, then I is said to be dominated by J. The problem is then to identify each interval that is dominated by at least K other intervals, where K is a given level of "tolerable redundancy." An algorithm is developed to solve the problem in O(N log N) time and O(N*) space, where N is the number of intervals and N* is a precisely defined value that never exceeds N and is frequently much smaller. This criterion for discarding database hits has been implemented in the Blast program, as illustrated herein with examples. Several variations and extensions of this approach are also described.

  11. Characterizing the D2 statistic: word matches in biological sequences.

    PubMed

    Forêt, Sylvain; Wilson, Susan R; Burden, Conrad J

    2009-01-01

    Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches. We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases. This enables the distribution of D2 to be approximated for typical situations arising in biological research. We apply these results to the identification of cis-regulatory modules, and show that this method detects such sequences with a high accuracy. The ability to approximate the distribution of D2 for both exact and approximate word matches will enable the use of this statistic in a more precise manner for sequence comparison, database searches, and identification of transcription factor binding sites.

  12. CLAST: CUDA implemented large-scale alignment search tool.

    PubMed

    Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

    2014-12-11

    Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.

  13. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR).

    PubMed

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J; Laclette, Juan P; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-05-19

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest.

  14. Genome analysis of Excretory/Secretory proteins in Taenia solium reveals their Abundance of Antigenic Regions (AAR)

    PubMed Central

    Gomez, Sandra; Adalid-Peralta, Laura; Palafox-Fonseca, Hector; Cantu-Robles, Vito Adrian; Soberón, Xavier; Sciutto, Edda; Fragoso, Gladis; Bobes, Raúl J.; Laclette, Juan P.; Yauner, Luis del Pozo; Ochoa-Leyva, Adrián

    2015-01-01

    Excretory/Secretory (ES) proteins play an important role in the host-parasite interactions. Experimental identification of ES proteins is time-consuming and expensive. Alternative bioinformatics approaches are cost-effective and can be used to prioritize the experimental analysis of therapeutic targets for parasitic diseases. Here we predicted and functionally annotated the ES proteins in T. solium genome using an integration of bioinformatics tools. Additionally, we developed a novel measurement to evaluate the potential antigenicity of T. solium secretome using sequence length and number of antigenic regions of ES proteins. This measurement was formalized as the Abundance of Antigenic Regions (AAR) value. AAR value for secretome showed a similar value to that obtained for a set of experimentally determined antigenic proteins and was different to the calculated value for the non-ES proteins of T. solium genome. Furthermore, we calculated the AAR values for known helminth secretomes and they were similar to that obtained for T. solium. The results reveal the utility of AAR value as a novel genomic measurement to evaluate the potential antigenicity of secretomes. This comprehensive analysis of T. solium secretome provides functional information for future experimental studies, including the identification of novel ES proteins of therapeutic, diagnosis and immunological interest. PMID:25989346

  15. Volcanic Soils as Sources of Novel CO-Oxidizing Paraburkholderia and Burkholderia: Paraburkholderia hiiakae sp. nov., Paraburkholderia metrosideri sp. nov., Paraburkholderia paradisi sp. nov., Paraburkholderia peleae sp. nov., and Burkholderia alpina sp. nov. a Member of the Burkholderia cepacia Complex

    PubMed Central

    Weber, Carolyn F.; King, Gary M.

    2017-01-01

    Previous studies showed that members of the Burkholderiales were important in the succession of aerobic, molybdenum-dependent CO oxidizing-bacteria on volcanic soils. During these studies, four isolates were obtained from Kilauea Volcano (Hawai‘i, USA); one strain was isolated from Pico de Orizaba (Mexico) during a separate study. Based on 16S rRNA gene sequence similarities, the Pico de Orizaba isolate and the isolates from Kilauea Volcano were provisionally assigned to the genera Burkholderia and Paraburkholderia, respectively. Each of the isolates possessed a form I coxL gene that encoded the catalytic subunit of carbon monoxide dehydrogenase (CODH); none of the most closely related type strains possessed coxL or oxidized CO. Genome sequences for Paraburkholderia type strains facilitated an analysis of 16S rRNA gene sequence similarities and average nucleotide identities (ANI). ANI did not exceed 95% (the recommended cutoff for species differentiation) for any of the pairwise comparisons among 27 reference strains related to the new isolates. However, since the highest 16S rRNA gene sequence similarity among this set of reference strains was 98.93%, DNA-DNA hybridizations (DDH) were performed for two isolates whose 16S rRNA gene sequence similarities with their nearest phylogenetic neighbors were 98.96 and 99.11%. In both cases DDH values were <16%. Based on multiple variables, four of the isolates represent novel species within the Paraburkholderia: Paraburkholderia hiiakae sp. nov. (type strain I2T = DSM 28029T = LMG 27952T); Paraburkholderia paradisi sp. nov. (type strain WAT = DSM 28027T = LMG 27949T); Paraburkholderia peleae sp. nov. (type strain PP52-1T = DSM 28028T = LMG 27950T); and Paraburkholderia metrosideri sp. nov. (type strain DNBP6-1T = DSM 28030T = LMG 28140T). The remaining isolate represents the first CO-oxidizing member of the Burkholderia cepacia complex: Burkholderia alpina sp. nov. (type strain PO-04-17-38T = DSM 28031T = LMG 28138T). PMID:28270796

  16. Volcanic Soils as Sources of Novel CO-Oxidizing Paraburkholderia and Burkholderia: Paraburkholderia hiiakae sp. nov., Paraburkholderia metrosideri sp. nov., Paraburkholderia paradisi sp. nov., Paraburkholderia peleae sp. nov., and Burkholderia alpina sp. nov. a Member of the Burkholderia cepacia Complex.

    PubMed

    Weber, Carolyn F; King, Gary M

    2017-01-01

    Previous studies showed that members of the Burkholderiales were important in the succession of aerobic, molybdenum-dependent CO oxidizing-bacteria on volcanic soils. During these studies, four isolates were obtained from Kilauea Volcano (Hawai'i, USA); one strain was isolated from Pico de Orizaba (Mexico) during a separate study. Based on 16S rRNA gene sequence similarities, the Pico de Orizaba isolate and the isolates from Kilauea Volcano were provisionally assigned to the genera Burkholderia and Paraburkholderia , respectively. Each of the isolates possessed a form I coxL gene that encoded the catalytic subunit of carbon monoxide dehydrogenase (CODH); none of the most closely related type strains possessed coxL or oxidized CO. Genome sequences for Paraburkholderia type strains facilitated an analysis of 16S rRNA gene sequence similarities and average nucleotide identities (ANI). ANI did not exceed 95% (the recommended cutoff for species differentiation) for any of the pairwise comparisons among 27 reference strains related to the new isolates. However, since the highest 16S rRNA gene sequence similarity among this set of reference strains was 98.93%, DNA-DNA hybridizations (DDH) were performed for two isolates whose 16S rRNA gene sequence similarities with their nearest phylogenetic neighbors were 98.96 and 99.11%. In both cases DDH values were <16%. Based on multiple variables, four of the isolates represent novel species within the Paraburkholderia : Paraburkholderia hiiakae sp. nov. (type strain I2 T = DSM 28029 T = LMG 27952 T ); Paraburkholderia paradisi sp. nov. (type strain WA T = DSM 28027 T = LMG 27949 T ); Paraburkholderia peleae sp. nov. (type strain PP52-1 T = DSM 28028 T = LMG 27950 T ); and Paraburkholderia metrosideri sp. nov. (type strain DNBP6-1 T = DSM 28030 T = LMG 28140 T ). The remaining isolate represents the first CO-oxidizing member of the Burkholderia cepacia complex: Burkholderia alpina sp. nov. (type strain PO-04-17-38 T = DSM 28031 T = LMG 28138 T ).

  17. The master regulator PhoP coordinates phosphate and nitrogen metabolism, respiration, cell differentiation and antibiotic biosynthesis: comparison in Streptomyces coelicolor and Streptomyces avermitilis.

    PubMed

    Martín, Juan F; Rodríguez-García, Antonio; Liras, Paloma

    2017-05-01

    Phosphate limitation is important for production of antibiotics and other secondary metabolites in Streptomyces. Phosphate control is mediated by the two-component system PhoR-PhoP. Following phosphate depletion, PhoP stimulates expression of genes involved in scavenging, transport and mobilization of phosphate, and represses the utilization of nitrogen sources. PhoP reduces expression of genes for aerobic respiration and activates nitrate respiration genes. PhoP activates genes for teichuronic acid formation and reduces expression of genes for phosphate-rich teichoic acid biosynthesis. In Streptomyces coelicolor, PhoP repressed several differentiation and pleiotropic regulatory genes, which affects development and indirectly antibiotic biosynthesis. A new bioinformatics analysis of the putative PhoP-binding sequences in Streptomyces avermitilis was made. Many sequences in S. avermitilis genome showed high weight values and were classified according to the available genetic information. These genes encode phosphate scavenging proteins, phosphate transporters and nitrogen metabolism genes. Among of the genes highlighted in the new studies was aveR, located in the avermectin gene cluster, encoding a LAL-type regulator, and afsS, which is regulated by PhoP and AfsR. The sequence logo for S. avermitilis PHO boxes is similar to that of S. coelicolor, with differences in the weight value for specific nucleotides in the sequence.

  18. An improved model for whole genome phylogenetic analysis by Fourier transform.

    PubMed

    Yin, Changchuan; Yau, Stephen S-T

    2015-10-07

    DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. SIMAP—the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

    PubMed Central

    Arnold, Roland; Goldenberg, Florian; Mewes, Hans-Werner; Rattei, Thomas

    2014-01-01

    The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads. PMID:24165881

  20. Chloroplast Genome Evolution in Early Diverged Leptosporangiate Ferns

    PubMed Central

    Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong

    2014-01-01

    In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnV-GCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of co-dons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns. PMID:24823358

  1. Chloroplast genome evolution in early diverged leptosporangiate ferns.

    PubMed

    Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong

    2014-05-01

    In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnVGCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of codons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns.

  2. Pseudoxanthomonas koreensis sp. nov. and Pseudoxanthomonas daejeonensis sp. nov.

    PubMed

    Yang, Deok-Chun; Im, Wan-Taek; Kim, Myung Kyum; Lee, Sung-Taik

    2005-03-01

    Gram-negative, non-spore-forming, rod-shaped bacteria, T7-09(T) and TR6-08(T), were isolated from soil from a ginseng field in South Korea and characterized to determine their taxonomic position. 16S rRNA gene sequence analysis showed that the two isolates shared 99.5 % sequence similarity. Strains T7-09(T) and TR6-08(T) were shown to belong to the Proteobacteria and showed the highest levels of sequence similarity to Pseudoxanthomonas broegbernensis DSM 12573(T) (98.1 %), Pseudoxanthomonas mexicana AMX 26B(T) (97.4-97.5 %), Pseudoxanthomonas japonensis 12-3(T) (96.5-96.6 %), Pseudoxanthomonas taiwanensis ATCC BAA-404(T) (95.7 %) and Xanthomonas campestris ATCC 33913(T) (96.3-96.5 %). The sequence similarity values with respect to any species with validly published names in related genera were less than 96.5 %. The detection of a quinone system with Q-8 as the predominant compound and a fatty acid profile with C(15 : 0) iso as the predominant acid supported the assignment of the novel isolates to the order 'Xanthomonadales'. The two isolates could be distinguished from the established species of the genus Pseudoxanthomonas by the presence of quantitative unsaturated fatty acid C(17 : 1) iso omega9c and by their unique biochemical profiles. The results of DNA-DNA hybridization clearly demonstrated that T7-09(T) and TR6-08(T) represent separate species. On the basis of these data, it is proposed that T7-09(T) (=KCTC 12208(T)=IAM 15116(T)) and TR6-08(T) (=KCTC 12207(T)=IAM 15115(T)) be classified as the type strains of two novel Pseudoxanthomonas species, for which the names Pseudoxanthomonas koreensis sp. nov. and Pseudoxanthomonas daejeonensis sp. nov., respectively, are proposed.

  3. High-Throughput Identification and Screening of Novel Methylobacterium Species Using Whole-Cell MALDI-TOF/MS Analysis

    PubMed Central

    Tani, Akio; Sahin, Nurettin; Matsuyama, Yumiko; Enomoto, Takashi; Nishimura, Naoki; Yokota, Akira; Kimbara, Kazuhide

    2012-01-01

    Methylobacterium species are ubiquitous α-proteobacteria that reside in the phyllosphere and are fed by methanol that is emitted from plants. In this study, we applied whole-cell matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis (WC-MS) to evaluate the diversity of Methylobacterium species collected from a variety of plants. The WC-MS spectrum was reproducible through two weeks of cultivation on different media. WC-MS spectrum peaks of M. extorquens strain AM1 cells were attributed to ribosomal proteins, but those were not were also found. We developed a simple method for rapid identification based on spectra similarity. Using all available type strains of Methylobacterium species, the method provided a certain threshold similarity value for species-level discrimination, although the genus contains some type strains that could not be easily discriminated solely by 16S rRNA gene sequence similarity. Next, we evaluated the WC-MS data of approximately 200 methylotrophs isolated from various plants with MALDI Biotyper software (Bruker Daltonics). Isolates representing each cluster were further identified by 16S rRNA gene sequencing. In most cases, the identification by WC-MS matched that by sequencing, and isolates with unique spectra represented possible novel species. The strains belonging to M. extorquens, M. adhaesivum, M. marchantiae, M. komagatae, M. brachiatum, M. radiotolerans, and novel lineages close to M. adhaesivum, many of which were isolated from bryophytes, were found to be the most frequent phyllospheric colonizers. The WC-MS technique provides emerging high-throughputness in the identification of known/novel species of bacteria, enabling the selection of novel species in a library and identification without 16S rRNA gene sequencing. PMID:22808262

  4. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

    PubMed

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-07-13

    Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.

  5. Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

    PubMed Central

    Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

    2007-01-01

    Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. Conclusion UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page. PMID:17629909

  6. Evolutionary Roots and Diversification of the Genus Aeromonas.

    PubMed

    Sanglas, Ariadna; Albarral, Vicenta; Farfán, Maribel; Lorén, J G; Fusté, M C

    2017-01-01

    Despite the importance of diversification rates in the study of prokaryote evolution, they have not been quantitatively assessed for the majority of microorganism taxa. The investigation of evolutionary patterns in prokaryotes constitutes a challenge due to a very scarce fossil record, limited morphological differentiation and frequently complex taxonomic relationships, which make even species recognition difficult. Although the speciation models and speciation rates in eukaryotes have traditionally been established by analyzing the fossil record data, this is frequently incomplete, and not always available. More recently, several methods based on molecular sequence data have been developed to estimate speciation and extinction rates from phylogenies reconstructed from contemporary taxa. In this work, we determined the divergence time and temporal diversification of the genus Aeromonas by applying these methods widely used with eukaryotic taxa. Our analysis involved 150 Aeromonas strains using the concatenated sequences of two housekeeping genes (approximately 2,000 bp). Dating and diversification model analyses were performed using two different approaches: obtaining the consensus sequence from the concatenated sequences corresponding to all the strains belonging to the same species, or generating the species tree from multiple alignments of each gene. We used BEAST to perform a Bayesian analysis to estimate both the phylogeny and the divergence times. A global molecular clock cannot be assumed for any gene. From the chronograms obtained, we carried out a diversification analysis using several approaches. The results suggest that the genus Aeromonas began to diverge approximately 250 millions of years (Ma) ago. All methods used to determine Aeromonas diversification gave similar results, suggesting that the speciation process in this bacterial genus followed a rate-constant (Yule) diversification model, although there is a small probability that a slight deceleration occurred in recent times. We also determined the constant of diversification (λ) values, which in all cases were very similar, about 0.01 species/Ma, a value clearly lower than those described for different eukaryotes.

  7. Evolutionary Roots and Diversification of the Genus Aeromonas

    PubMed Central

    Sanglas, Ariadna; Albarral, Vicenta; Farfán, Maribel; Lorén, J. G.; Fusté, M. C.

    2017-01-01

    Despite the importance of diversification rates in the study of prokaryote evolution, they have not been quantitatively assessed for the majority of microorganism taxa. The investigation of evolutionary patterns in prokaryotes constitutes a challenge due to a very scarce fossil record, limited morphological differentiation and frequently complex taxonomic relationships, which make even species recognition difficult. Although the speciation models and speciation rates in eukaryotes have traditionally been established by analyzing the fossil record data, this is frequently incomplete, and not always available. More recently, several methods based on molecular sequence data have been developed to estimate speciation and extinction rates from phylogenies reconstructed from contemporary taxa. In this work, we determined the divergence time and temporal diversification of the genus Aeromonas by applying these methods widely used with eukaryotic taxa. Our analysis involved 150 Aeromonas strains using the concatenated sequences of two housekeeping genes (approximately 2,000 bp). Dating and diversification model analyses were performed using two different approaches: obtaining the consensus sequence from the concatenated sequences corresponding to all the strains belonging to the same species, or generating the species tree from multiple alignments of each gene. We used BEAST to perform a Bayesian analysis to estimate both the phylogeny and the divergence times. A global molecular clock cannot be assumed for any gene. From the chronograms obtained, we carried out a diversification analysis using several approaches. The results suggest that the genus Aeromonas began to diverge approximately 250 millions of years (Ma) ago. All methods used to determine Aeromonas diversification gave similar results, suggesting that the speciation process in this bacterial genus followed a rate-constant (Yule) diversification model, although there is a small probability that a slight deceleration occurred in recent times. We also determined the constant of diversification (λ) values, which in all cases were very similar, about 0.01 species/Ma, a value clearly lower than those described for different eukaryotes. PMID:28228750

  8. Explosion Source Similarity Analysis via SVD

    NASA Astrophysics Data System (ADS)

    Yedlin, Matthew; Ben Horin, Yochai; Margrave, Gary

    2016-04-01

    An important seismological ingredient for establishing a regional seismic nuclear discriminant is the similarity analysis of a sequence of explosion sources. To investigate source similarity, we are fortunate to have access to a sequence of 1805 three-component recordings of quarry blasts, shot from March 2002 to January 2015. The centroid of these blasts has an estimated location 36.3E and 29.9N. All blasts were detonated by JPMC (Jordan Phosphate Mines Co.) All data were recorded at the Israeli NDC, HFRI, located at 30.03N and 35.03E. Data were first winnowed based on the distribution of maximum amplitudes in the neighborhood of the P-wave arrival. The winnowed data were then detrended using the algorithm of Cleveland et al (1990). The detrended data were bandpass filtered between .1 to 12 Hz using an eighth order Butterworth filter. Finally, data were sorted based on maximum trace amplitude. Two similarity analysis approaches were used. First, for each component, the entire suite of traces was decomposed into its eigenvector representation, by employing singular-valued decomposition (SVD). The data were then reconstructed using 10 percent of the singular values, with the resulting enhancement of the S-wave and surface wave arrivals. The results of this first method are then compared to the second analysis method based on the eigenface decomposition analysis of Turk and Pentland (1991). While both methods yield similar results in enhancement of data arrivals and reduction of data redundancy, more analysis is required to calibrate the recorded data to charge size, a quantity that was not available for the current study. References Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I., Stl: A seasonal-trend decomposition procedure based on loess, Journal of Official Statistics, 6, No. 1, 3-73, 1990. Turk, M. and Pentland, A., Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1), 71-86, 1991.

  9. Further exploration of MRI techniques for liver T1rho quantification.

    PubMed

    Zhao, Feng; Yuan, Jing; Deng, Min; Lu, Pu-Xuan; Ahuja, Anil T; Wang, Yi-Xiang J

    2013-12-01

    With biliary duct ligation and CCl4 induced rat liver fibrosis models, recent studies showed that MR T1rho imaging is able to detect liver fibrosis, and the degree of fibrosis is correlated with the degree of elevation of the T1rho measurements, suggesting liver T1rho quantification may play an important role for liver fibrosis early detection and grading. It has also been reported it is feasible to obtain consistent liver T1rho measurement for human subjects at 3 Tesla (3 T), and preliminary clinical data suggest liver T1rho is increased in patients with cirrhosis. In these previous studies, T1rho imaging was used with the rotary-echo spin-lock pulse for T1rho preparation, and number of signal averaging (NSA) was 2. Due to the presence of inhomogeneous B0 field, artifacts may occur in the acquired T1rho-weighted images. The method described by Dixon et al. (Magn Reson Med 1996;36:90-4), which is a hard RF pulse with 135° flip angle and same RF phase as the spin-locking RF pulse is inserted right before and after the spin-locking RF pulse, has been proposed to reduce sensitivity to B0 field inhomogeneity in T1rho imaging. In this study, we compared the images scanned by rotary-echo spin-lock pulse method (sequence 1) and the pulse modified according to Dixon method (sequence 2). When the artifacts occurred in T1rho images, we repeated the same scan until satisfactory. We accepted images if artifact in liver was less than 10% of liver area by visual estimation. When NSA =2, the breath-holding duration for data acquisition of one slice scanning was 8 sec due to a delay time of 6,000 ms for magnetization restoration. If NSA =1, the duration was shortened to be 2 sec. In previous studies, manual region of interest (ROI) analysis of T1rho map was used. In this current study, histogram analysis was also applied to evaluate liver T1rho value on T1rho maps. MRI data acquisition was performed on a 3 T clinical scanner. There were 29 subjects with 61 examinations obtained. Liver T1rho values obtained by sequence 1 (NSA =2) and sequence 2 (NSA =2) showed similar values, i.e., 43.1±2.1 ms (range: 38.6-48.0 ms, n=40 scans) vs. 43.5±2.5 ms (range: 39.0-47.7 ms, 
n=12 scans, P=0.74) respectively. For the six volunteers scanned with both sequences in one session, the intraclass correlation coefficient (ICC) was 0.939. Overall, the success rate of obtaining satisfactory images per acquisition was slightly over 50% for both sequence 1 and sequence 2. Satisfactory images can usually be obtained by asking the volunteer subjects to better hold their breath. However, sequence 2 did not increase the scan success rate. For the nine subjects scanned by sequence 2 with both NSA =2 and NSA =1 during one session, the ICC was 0.274, demonstrated poor agreement. T1rho measurement by ROI method and histogram had an ICC of 0.901 (P>0.05), demonstrated very good agreement. We conclude that by including 135° flip angle before and after the spin-locking RF pulse, the rate of artifacts occurring did not decrease. On the other hand, sequence 1 and sequence 2 measured similar T1rho value in healthy liver. While reducing the breath-holding duration significantly, NSA =1 did not offer satisfactory signal-to-noise ratio. Histogram measurement can be adopted for future studies.

  10. Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider

    PubMed Central

    Goodman, Laura B.; Lawton, Marie R.; Franklin-Guild, Rebecca J.; Anderson, Renee R.; Schaan, Lynn; Thachil, Anil J.; Wiedmann, Martin; Miller, Claire B.; Alcaine, Samuel D.; Kovac, Jasna

    2017-01-01

    A strain of lactic acid bacteria, designated 159469T, isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469T was closely related to Lactococcus garvieae ATCC 43921T, showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA–DNA hybridization value of 50.7  % were determined for the genome of strain 159469T, when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469T (=LMG 30040T=DSM 104842T). PMID:28945531

  11. Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider.

    PubMed

    Goodman, Laura B; Lawton, Marie R; Franklin-Guild, Rebecca J; Anderson, Renee R; Schaan, Lynn; Thachil, Anil J; Wiedmann, Martin; Miller, Claire B; Alcaine, Samuel D; Kovac, Jasna

    2017-11-01

    A strain of lactic acid bacteria, designated 159469 T , isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469 T was closely related to Lactococcus garvieae ATCC 43921 T , showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA-DNA hybridization value of 50.7  % were determined for the genome of strain 159469 T , when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469 T (=LMG 30040 T =DSM 104842 T ).

  12. Isotope geochemistry of mercury in source rocks, mineral deposits and spring deposits of the California Coast Ranges, USA

    NASA Astrophysics Data System (ADS)

    Smith, Christopher N.; Kesler, Stephen E.; Blum, Joel D.; Rytuba, James J.

    2008-05-01

    We present here the first study of the isotopic composition of mercury in rocks, ore deposits, and active spring deposits from the California Coast Ranges, a part of Earth's crust with unusually extensive evidence of mercury mobility and enrichment. The Franciscan Complex and Great Valley Sequence, which form the bedrock in the California Coast Ranges, are intruded and overlain by Tertiary volcanic rocks including the Clear Lake Volcanic Sequence. These rocks contain two types of mercury deposits, hot-spring deposits that form at shallow depths (< 300 m) and silica-carbonate deposits that extend to depths of 1000 m. Active springs and geothermal areas continue to precipitate Hg and Au and are modern analogues to the fossil hydrothermal systems preserved in the ore deposits. The Franciscan Complex and Great Valley Sequence contain clastic sedimentary rocks with higher concentrations of mercury than volcanic rocks of the Clear Lake Volcanic Field. Mean mercury isotopic compositions ( δ202Hg) for all three rock units are similar, although the range of values in Franciscan Complex rocks is greater than in either Great Valley or Clear Lake rocks. Hot spring and silica-carbonate mercury deposits have similar average mercury isotopic compositions that are indistinguishable from averages for the three rock units, although δ202Hg values for the mercury deposits have a greater variance than the country rocks. Precipitates from spring and geothermal waters in the area have similarly large variance and a mean δ202Hg value that is significantly lower than the ore deposits and rocks. These observations indicate that there is little or no isotopic fractionation (< ± 0.5‰) during release of mercury from its source rocks into hydrothermal solutions. Isotopic fractionation does appear to take place during transport and concentration of mercury in deposits, however, especially in their uppermost parts. Boiling of hydrothermal fluids, separation of a mercury-bearing CO 2 vapor or reduction and volatilization of Hg (0) in the near-surface environment are likely the most important processes causing the observed Hg isotope fractionation. This should result in the release of mercury with low δ202Hg values into the atmosphere from the top of these hydrothermal systems. Estimates of mass balance suggest that residual Hg reservoirs are not measurably enriched in heavy Hg isotopes as a result of this process because only a small amount of Hg (< 4%) leaves actively ore-forming systems.

  13. Isotope geochemistry of mercury in source rocks, mineral deposits and spring deposits of the California Coast Ranges, USA

    USGS Publications Warehouse

    Smith, C.N.; Kesler, S.E.; Blum, J.D.; Rytuba, J.J.

    2008-01-01

    We present here the first study of the isotopic composition of mercury in rocks, ore deposits, and active spring deposits from the California Coast Ranges, a part of Earth's crust with unusually extensive evidence of mercury mobility and enrichment. The Franciscan Complex and Great Valley Sequence, which form the bedrock in the California Coast Ranges, are intruded and overlain by Tertiary volcanic rocks including the Clear Lake Volcanic Sequence. These rocks contain two types of mercury deposits, hot-spring deposits that form at shallow depths (< 300??m) and silica-carbonate deposits that extend to depths of 1000??m. Active springs and geothermal areas continue to precipitate Hg and Au and are modern analogues to the fossil hydrothermal systems preserved in the ore deposits. The Franciscan Complex and Great Valley Sequence contain clastic sedimentary rocks with higher concentrations of mercury than volcanic rocks of the Clear Lake Volcanic Field. Mean mercury isotopic compositions (??202Hg) for all three rock units are similar, although the range of values in Franciscan Complex rocks is greater than in either Great Valley or Clear Lake rocks. Hot spring and silica-carbonate mercury deposits have similar average mercury isotopic compositions that are indistinguishable from averages for the three rock units, although ??202Hg values for the mercury deposits have a greater variance than the country rocks. Precipitates from spring and geothermal waters in the area have similarly large variance and a mean ??202Hg value that is significantly lower than the ore deposits and rocks. These observations indicate that there is little or no isotopic fractionation (< ?? 0.5???) during release of mercury from its source rocks into hydrothermal solutions. Isotopic fractionation does appear to take place during transport and concentration of mercury in deposits, however, especially in their uppermost parts. Boiling of hydrothermal fluids, separation of a mercury-bearing CO2 vapor or reduction and volatilization of Hg(0) in the near-surface environment are likely the most important processes causing the observed Hg isotope fractionation. This should result in the release of mercury with low ??202Hg values into the atmosphere from the top of these hydrothermal systems. Estimates of mass balance suggest that residual Hg reservoirs are not measurably enriched in heavy Hg isotopes as a result of this process because only a small amount of Hg (< 4%) leaves actively ore-forming systems. ?? 2008 Elsevier B.V. All rights reserved.

  14. Processing Dynamic Image Sequences from a Moving Sensor.

    DTIC Science & Technology

    1984-02-01

    65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign

  15. SME-type carbapenem-hydrolyzing class A beta-lactamases from geographically diverse Serratia marcescens strains.

    PubMed

    Queenan, A M; Torres-Viera, C; Gold, H S; Carmeli, Y; Eliopoulos, G M; Moellering, R C; Quinn, J P; Hindler, J; Medeiros, A A; Bush, K

    2000-11-01

    Three sets of carbapenem-resistant Serratia marcescens isolates have been identified in the United States: 1 isolate in Minnesota in 1985 (before approval of carbapenems for clinical use), 5 isolates in Los Angeles (University of California at Los Angeles [UCLA]) in 1992, and 19 isolates in Boston from 1994 to 1999. All isolates tested produced two beta-lactamases, an AmpC-type enzyme with pI values of 8.6 to 9.0 and one with a pI value of approximately 9.5. The enzyme with the higher pI in each strain hydrolyzed carbapenems and was not inhibited by EDTA, similar to the chromosomal class A SME-1 beta-lactamase isolated from the 1982 London strain S. marcescens S6. The genes encoding the carbapenem-hydrolyzing enzymes were cloned in Escherichia coli and sequenced. The enzyme from the Minnesota isolate had an amino acid sequence identical to that of SME-1. The isolates from Boston and UCLA produced SME-2, an enzyme with a single amino acid change relative to SME-1, a substitution from valine to glutamine at position 207. Purified SME enzymes from the U. S. isolates had beta-lactam hydrolysis profiles similar to that of the London SME-1 enzyme. Pulsed-field gel electrophoresis analysis revealed that the isolates showed some similarity but differed by at least three genetic events. In conclusion, a family of rare class A carbapenem-hydrolyzing beta-lactamases first described in London has now been identified in S. marcescens isolates across the United States.

  16. SME-Type Carbapenem-Hydrolyzing Class A β-Lactamases from Geographically Diverse Serratia marcescens Strains

    PubMed Central

    Queenan, Anne Marie; Torres-Viera, Carlos; Gold, Howard S.; Carmeli, Yehuda; Eliopoulos, George M.; Moellering, Robert C.; Quinn, John P.; Hindler, Janet; Medeiros, Antone A.; Bush, Karen

    2000-01-01

    Three sets of carbapenem-resistant Serratia marcescens isolates have been identified in the United States: 1 isolate in Minnesota in 1985 (before approval of carbapenems for clinical use), 5 isolates in Los Angeles (University of California at Los Angeles [UCLA]) in 1992, and 19 isolates in Boston from 1994 to 1999. All isolates tested produced two β-lactamases, an AmpC-type enzyme with pI values of 8.6 to 9.0 and one with a pI value of approximately 9.5. The enzyme with the higher pI in each strain hydrolyzed carbapenems and was not inhibited by EDTA, similar to the chromosomal class A SME-1 β-lactamase isolated from the 1982 London strain S. marcescens S6. The genes encoding the carbapenem-hydrolyzing enzymes were cloned in Escherichia coli and sequenced. The enzyme from the Minnesota isolate had an amino acid sequence identical to that of SME-1. The isolates from Boston and UCLA produced SME-2, an enzyme with a single amino acid change relative to SME-1, a substitution from valine to glutamine at position 207. Purified SME enzymes from the U.S. isolates had β-lactam hydrolysis profiles similar to that of the London SME-1 enzyme. Pulsed-field gel electrophoresis analysis revealed that the isolates showed some similarity but differed by at least three genetic events. In conclusion, a family of rare class A carbapenem-hydrolyzing β-lactamases first described in London has now been identified in S. marcescens isolates across the United States. PMID:11036019

  17. Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences.

    PubMed

    Cheng, Jiujun; Romantsov, Tatyana; Engel, Katja; Doxey, Andrew C; Rose, David R; Neufeld, Josh D; Charles, Trevor C

    2017-01-01

    The techniques of metagenomics have allowed researchers to access the genomic potential of uncultivated microbes, but there remain significant barriers to determination of gene function based on DNA sequence alone. Functional metagenomics, in which DNA is cloned and expressed in surrogate hosts, can overcome these barriers, and make important contributions to the discovery of novel enzymes. In this study, a soil metagenomic library carried in an IncP cosmid was used for functional complementation for β-galactosidase activity in both Sinorhizobium meliloti (α-Proteobacteria) and Escherichia coli (γ-Proteobacteria) backgrounds. One β-galactosidase, encoded by six overlapping clones that were selected in both hosts, was identified as a member of glycoside hydrolase family 2. We could not identify ORFs obviously encoding possible β-galactosidases in 19 other sequenced clones that were only able to complement S. meliloti. Based on low sequence identity to other known glycoside hydrolases, yet not β-galactosidases, three of these ORFs were examined further. Biochemical analysis confirmed that all three encoded β-galactosidase activity. Lac36W_ORF11 and Lac161_ORF7 had conserved domains, but lacked similarities to known glycoside hydrolases. Lac161_ORF10 had neither conserved domains nor similarity to known glycoside hydrolases. Bioinformatic and structural modeling implied that Lac161_ORF10 protein represented a novel enzyme family with a five-bladed propeller glycoside hydrolase domain. By discovering founding members of three novel β-galactosidase families, we have reinforced the value of functional metagenomics for isolating novel genes that could not have been predicted from DNA sequence analysis alone.

  18. Predicting Hydrologic Function With Aquatic Gene Fragments

    NASA Astrophysics Data System (ADS)

    Good, S. P.; URycki, D. R.; Crump, B. C.

    2018-03-01

    Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.

  19. The Effects of Within-Sequence Acoustic Similarity on the Short-Term Retention of Consonants and Words

    ERIC Educational Resources Information Center

    Marcer, D.; And Others

    1977-01-01

    Compares the rates of forgetting of five-item sequences of acoustically similar and dissimilar consonants and words in the absence of proactive and retroactive interference in order to test whether within sequence similarity rather than stimulus length would have a greater influence on retention. (Author/RK)

  20. A space-efficient algorithm for local similarities.

    PubMed

    Huang, X Q; Hardison, R C; Miller, W

    1990-10-01

    Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.

  1. T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences

    PubMed Central

    Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir

    2017-01-01

    Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407

  2. Polishing mechanism of light-initiated dental composite: Geometric optics approach.

    PubMed

    Chiang, Yu-Chih; Lai, Eddie Hsiang-Hua; Kunzelmann, Karl-Heinz

    2016-12-01

    For light-initiated dental hybrid composites, reinforcing particles are much stiffer than the matrix, which makes the surface rugged after inadequate polish and favors bacterial adhesion and biofilm redevelopment. The aim of the study was to investigate the polishing mechanism via the geometric optics approach. We defined the polishing abilities of six instruments using the obtained gloss values through the geometric optics approach (micro-Tri-gloss with 20°, 60°, and 85° measurement angles). The surface texture was validated using a field emission scanning electron microscope (FE-SEM). Based on the gloss values, we sorted polishing tools into three abrasive levels, and proposed polishing sequences to test the hypothesis that similar abrasive levels would leave equivalent gloss levels on dental composites. The three proposed, tested polishing sequences included: S1, Sof-Lex XT coarse disc, Sof-Lex XT fine disc, and OccluBrush; S2, Sof-Lex XT coarse disc, Prisma Gloss polishing paste, and OccluBrush; and S3, Sof-Lex XT coarse disc, Enhance finishing cups, and OccluBrush. S1 demonstrated significantly higher surface gloss than the other procedures (p < 0.05). The surface textures (FE-SEM micrographs) correlated well with the obtained gloss values. Nominally similar abrasive abilities did not result in equivalent polish levels, indicating that the polishing tools must be evaluated and cannot be judged based on their compositions or abrasive sizes. The geometric optic approach is an efficient and nondestructive method to characterize the polished surface of dental composites. Copyright © 2015. Published by Elsevier B.V.

  3. FSH: fast spaced seed hashing exploiting adjacent hashes.

    PubMed

    Girotto, Samuele; Comin, Matteo; Pizzi, Cinzia

    2018-01-01

    Patterns with wildcards in specified positions, namely spaced seeds , are increasingly used instead of k -mers in many bioinformatics applications that require indexing, querying and rapid similarity search, as they can provide better sensitivity. Many of these applications require to compute the hashing of each position in the input sequences with respect to the given spaced seed, or to multiple spaced seeds. While the hashing of k -mers can be rapidly computed by exploiting the large overlap between consecutive k -mers, spaced seeds hashing is usually computed from scratch for each position in the input sequence, thus resulting in slower processing. The method proposed in this paper, fast spaced-seed hashing (FSH), exploits the similarity of the hash values of spaced seeds computed at adjacent positions in the input sequence. In our experiments we compute the hash for each positions of metagenomics reads from several datasets, with respect to different spaced seeds. We also propose a generalized version of the algorithm for the simultaneous computation of multiple spaced seeds hashing. In the experiments, our algorithm can compute the hashing values of spaced seeds with a speedup, with respect to the traditional approach, between 1.6[Formula: see text] to 5.3[Formula: see text], depending on the structure of the spaced seed. Spaced seed hashing is a routine task for several bioinformatics application. FSH allows to perform this task efficiently and raise the question of whether other hashing can be exploited to further improve the speed up. This has the potential of major impact in the field, making spaced seed applications not only accurate, but also faster and more efficient. The software FSH is freely available for academic use at: https://bitbucket.org/samu661/fsh/overview.

  4. Divergence of the phytochrome gene family predates angiosperm evolution and suggests that Selaginella and Equisetum arose prior to Psilotum.

    PubMed

    Kolukisaoglu, H U; Marx, S; Wiegmann, C; Hanelt, S; Schneider-Poetsch, H A

    1995-09-01

    Thirty-two partial phytochrome sequences from algae, mosses, ferns, gymnosperms, and angiosperms (11 of them newly released ones from our laboratory) were analyzed by distance and character-state approaches (PHYLIP, TREECON, PAUP). In addition, 12 full-length sequences were analyzed. Despite low bootstrap values at individual internal nodes, the inferred trees (neighbor-joining, Fitch, maximum parsimony) generally showed similar branching orders consistent with other molecular data. Lower plants formed two distinct groups. One basal group consisted of Selaginella, Equisetum, and mosses; the other consisted of a monophyletic cluster of frond-bearing pteridophytes. Psilotum was a member of the latter group and hence perhaps was not, as sometimes suggested, a close relative of the first vascular plants. The results further suggest that phytochrome gene duplication giving rise to a- and b- and later to c-types may have taken place within seedfern genomes. Distance matrices dated the separation of mono- and dicotyledons back to about 260 million years before the present (Myr B.P.) and the separation of Metasequoia and Picea to a fossil record-compatible value of 230 Myr B.P. The Ephedra sequence clustered with the c- or a-type and Metasequoia and Picea sequences clustered with the b-type lineage. The "paleoherb" Nymphaea branched off from the c-type lineage prior to the divergence of mono- and dicotyledons on the a- and b-type branches. Sequences of Piper (another "paleoherb") created problems in that they branched off from different phytochrome lineages at nodes contradicting distance from the inferred trees' origin.

  5. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  6. Lactate dehydrogenase predicts combined progression-free survival after sequential therapy with abiraterone and enzalutamide for patients with castration-resistant prostate cancer.

    PubMed

    Mori, Keiichiro; Kimura, Takahiro; Onuma, Hajime; Kimura, Shoji; Yamamoto, Toshihiro; Sasaki, Hiroshi; Miki, Jun; Miki, Kenta; Egawa, Shin

    2017-07-01

    An array of clinical issues remains to be resolved for castration-resistant prostate cancer (CRPC), including the sequence of drug use and drug cross-resistance. At present, no clear guidelines are available for the optimal sequence of use of novel agents like androgen-receptor axis-targeted (ARAT) agents, particularly enzalutamide, and abiraterone. This study retrospectively analyzed a total of 69 patients with CRPC treated with sequential therapy using enzalutamide followed by abiraterone or vice versa. The primary outcome measure was the comparative combined progression-free survival (PFS) comprising symptomatic and/or radiographic PFS. Patients were also compared for total prostate-specific antigen (PSA)-PFS, overall survival (OS), and PSA response. The predictors of combined PFS and OS were analyzed with a backward-stepwise multivariate Cox model. Of the 69 patients, 46 received enzalutamide first, followed by abiraterone (E-A group), and 23 received abiraterone, followed by enzalutamide (A-E group). The two groups were not significantly different with regard to basic data, except for hemoglobin values. In a comparison with the E-A group, the A-E group was shown to be associated with better combined PFS in Kaplan-Meier analysis (P = 0.043). Similar results were obtained for total PSA-PFS (P = 0.049), while OS did not differ between groups (P = 0.62). Multivariate analysis demonstrated that pretreatment lactate dehydrogenase (LDH) values and age were significant predictors of longer combined PFS (P < 0.05). Likewise, multivariate analysis demonstrated that pretreatment hemoglobin values and performance status were significant predictors of longer OS (P < 0.05). The results of this study suggested the A-E sequence had longer combined PSA and total PSA-PFS compared to the E-A sequence in patients with CRPC. LDH values in sequential therapy may serve as a predictor of longer combined PFS. © 2017 Wiley Periodicals, Inc.

  7. Evaluation of genetic diversity amongst Descurainia sophia L. genotypes by inter-simple sequence repeat (ISSR) marker.

    PubMed

    Saki, Sahar; Bagheri, Hedayat; Deljou, Ali; Zeinalabedini, Mehrshad

    2016-01-01

    Descurainia sophia is a valuable medicinal plant in family of Brassicaceae. To determine the range of diversity amongst D. sophia in Iran, 32 naturally distributed plants belonging to six natural populations of the Iranian plateau were investigated by inter-simple sequence repeat (ISSR) markers. The average percentage of polymorphism produced by 12 ISSR primers was 86 %. The PIC values for primers ranged from 0.22 to 0.40 and Rp values ranged between 6.5 and 19.9. The relative genetic diversity of the populations was not high (Gst =0.32). However, the value of gene flow revealed by the ISSR marker was high (Nm = 1.03). UPGMA clustering method based on Jaccard similarity coefficient grouped the genotypes into two major clusters. Graph results from Neighbor-Net Network generated after a 1000 bootstrap test using Jaccard coefficient, and STRUCTURE analysis confirmed the UPGMA clustering. The first three PCAs represented 57.31 % of the total variation. The high levels of genetic diversity were observed within populations, which is useful in breeding and conservation programs. ISSR is found to be an eligible marker to study genetic diversity of D. sophia.

  8. Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae)

    PubMed Central

    Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J.; Sathyanarayana, N.; Egan, Ashley N.

    2016-01-01

    Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean. PMID:27356763

  9. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies

    PubMed Central

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-01-01

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. PMID:27172202

  10. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies.

    PubMed

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-07-07

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. Copyright © 2016 Chen et al.

  11. Transfer of Bacillus halodenitrificans Denariaz et al. 1989 to the genus Virgibacillus as Virgibacillus halodenitrificans comb. nov.

    PubMed

    Yoon, Jung-Hoon; Oh, Tae-Kwang; Park, Yong-Ha

    2004-11-01

    A Gram-variable, endospore-forming moderately halophilic rod, strain SF-121, was isolated from a marine solar saltern of the Yellow Sea in Korea. The result of 16S rRNA gene sequence analysis showed that strain SF-121 has highest sequence similarity (99.7 %) with the type strain of Bacillus halodenitrificans. Phylogenetic analyses based on 16S rRNA gene sequences revealed that B. halodenitrificans DSM 10037(T) and strain SF-121 are more closely related to the genus Virgibacillus than to the genus Bacillus. Strain SF-121 and B. halodenitrificans DSM 10037(T) exhibited 16S rRNA gene similarity levels of 95.3-97.5 % with the type strains of Virgibacillus species and 94.0 % with the type strain of Bacillus subtilis. DNA-DNA relatedness and phenotypic data indicated that B. halodenitrificans DSM 10037(T) and strain SF-121 are members of the same species. B. halodenitrificans DSM 10037(T) and strain SF-121 exhibited DNA-DNA relatedness values of 9-11 % with the type strains of Virgibacillus carmonensis and Virgibacillus marismortui. On the basis of the phenotypic, chemotaxonomic, phylogenetic and genetic data, B. halodenitrificans should be reclassified in the genus Virgibacillus as Virgibacillus halodenitrificans comb. nov.

  12. Assessing the genetic relationships of Curcuma alismatifolia varieties using simple sequence repeat markers.

    PubMed

    Taheri, S; Abdullah, T L; Abdullah, N A P; Ahmad, Z; Karimi, E; Shabanimofrad, M R

    2014-09-05

    The genus Curcuma is a member of the ginger family (Zingiberaceae) that has recently become popular for use as flowering pot plants, both indoors and as patio and landscape plants. We used PCR-based molecular markers (SSRs) to elucidate genetic variation and relationships between five varieties of Curcuma (Curcuma alismatifolia) cultivated in Malaysia. Of the primers tested, 8 (of 17) SSR primers were selected for their reproducibility and high rates of polymorphism. The number of presumed alleles revealed by the SSR analysis ranged from two to six alleles, with a mean value of 3.25 alleles per locus. The values of HO and HE ranged from 0 to 0.8 (mean value of 0.2) and 0.1837 to 0.7755 (mean value of 0.5102), respectively. Eight SSR primers yielded 26 total amplified fragments and revealed high rates of polymorphism among the varieties studied. The polymorphic information content varied from 0.26 to 0.73. Dice's similarity coefficient was calculated for all pairwise comparisons and used to construct an unweighted pair group method with arithmetic average (UPGMA) dendrogram. Similarity coefficient values from 0.2105 to 0.6667 (with an average of 0.4386) were found among the five varieties examined. A cluster analysis of data using a UPGMA algorithm divided the five varieties/hybrids into 2 groups.

  13. In-vitro Assessment of Knee MRI in the Presence of Metal Implants Comparing MAVRIC-SL and Conventional FSE Sequences at 1.5 and 3 Tesla Field Strength

    PubMed Central

    Liebl, Hans; Heilmeier, Ursula; Lee, Sonia; Nardo, Lorenzo; Patsch, Janina; Schuppert, Christopher; Han, Misung; Rondak, Ina-Christine; Banerjee, Suchandrima; Koch, Kevin; Link, Thomas M.; Krug, Roland

    2014-01-01

    PURPOSE To assess lesion detection and artifact size reduction of a MAVRIC-SEMAC hybrid sequence (MAVRIC-SL) compared to standard sequences at 1.5T and 3T in porcine knee specimens with metal hardware. METHODS Artificial cartilage and bone lesions of defined size were created in the proximity of titanium and steel screws with 2.5 mm diameter in 12 porcine knee specimens and were imaged at 1.5T and 3T MRI with MAVRIC-SL PD and STIR, standard FSE T2 PD and STIR and fat-saturated T2 FSE sequences. Three radiologists blinded to the lesion locations assessed lesion detection rates on randomized images for each sequence using ROC. Artifact length and width were measured. RESULTS Metal artifact sizes were largest in the presence of steel screws at 3T (FSE T2 FS: 28.7cm2) and 1.5T (16.03cm2). MAVRIC-SL PD and STIR reduced artifact sizes at both 3T (1.43cm2; 2.46cm2) and 1.5T (1.16cm2; 1.59cm2) compared to FS T2 FSE sequences (27.57cm2; 13.20cm2). At 3T, ROC derived AUC values using MAVRIC-SL sequences were significantly higher compared to standard sequences (MAVRIC-PD: 0.87, versus FSE-T2-FS: 0.73 (p=0.025); MAVRIC- STIR: 0.9 versus T2-STIR: 0.78 (p=0.001) and versus FSE-T2-FS: 0.73 (p=0.026)). Similar values were observed at 1.5T. Comparison of 3T and 1.5T showed no significant differences (MAVRIC-SL PD: p=0.382; MAVRIC-SL STIR: p=0.071. CONCLUSION MAVRIC-SL sequences provided superior lesion detection and reduced metal artifact size at both 1.5T and 3T compared to conventionally used FSE sequences. No significant disadvantage was found comparing MAVRIC-SL at 3T and 1.5T, though metal artifacts at 3T were larger. PMID:24912802

  14. Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp. nov.

    PubMed Central

    Doroghazi, J. R.; Ju, K.-S.; Metcalf, W. W.

    2014-01-01

    In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T forms a cluster with five other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these other species, including Streptomyces almquistii NRRL B-1685T, Streptomyces flocculus NRRL B-2465T, Streptomyces gibsonii NRRL B-1335T and Streptomyces rangoonensis NRRL B-12378T are quite similar. This cluster is of particular taxonomic interest because Streptomyces albus is the type species of the genus Streptomyces. The related strains were subjected to multilocus sequence analysis (MLSA) utilizing partial sequences of the housekeeping genes atpD, gyrB, recA, rpoB and trpB and confirmation of previously reported phenotypic characteristics. The five strains formed a coherent cluster supported by a 100 % bootstrap value in phylogenetic trees generated from sequence alignments prepared by concatenating the sequences of the housekeeping genes, and identical tree topology was observed using various different tree-making algorithms. Moreover, all but one strain, S. flocculus NRRL B-2465T, exhibited identical sequences for all of the five housekeeping gene loci sequenced, but NRRL B-2465T still exhibited an MLSA evolutionary distance of 0.005 from the other strains, a value that is lower than the 0.007 MLSA evolutionary distance threshold proposed for species-level relatedness. These data support a proposal to reclassify S. almquistii, S. flocculus, S. gibsonii and S. rangoonensis as later heterotypic synonyms of S. albus with NRRL B-1811T as the type strain. The MLSA sequence database also demonstrated utility for quickly and conclusively confirming that numerous strains within the ARS Culture Collection had been previously misidentified as subspecies of S. albus and that Streptomyces albus subsp. pathocidicus should be redescribed as a novel species, Streptomyces pathocidini sp. nov., with the type strain NRRL B-24287T. PMID:24277863

  15. Expressed sequence tag based identification and expression analysis of some cold inducible elements in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Ghangal, Rajesh; Raghuvanshi, Saurabh; Sharma, Prakash C

    2012-02-01

    A cDNA library was constructed from the mature leaves of seabuckthorn (Hippophae rhamnoides). Expressed Sequence Tags (ESTs) were generated by single pass sequencing of 4500 cDNA clones. We submitted 3412 ESTs to dbEST of NCBI. Clustering of these ESTs yielded 1665 unigenes comprising of 345 contigs and 1320 singletons. Out of 1665 unigenes, 1278 unigenes were annotated by similarity search while the remaining 387 unannotated unigenes were considered as organism specific. Gene Ontology (GO) analysis of the unigene dataset showed 691 unigenes related to biological processes, 727 to molecular functions and 588 to cellular component category. On the basis of similarity search and GO annotation, 43 unigenes were found responsive to biotic and abiotic stresses. To validate this observation, 13 genes that are known to be associated with cold stress tolerance from previous studies in Arabidopsis and 3 novel transcripts were examined by Real time RT-PCR to understand the change in expression pattern under cold/freeze stress. In silico study of occurrence of microsatellites in these ESTs revealed the presence of 62 Simple Sequence Repeats (SSRs), some of which are being explored to assess genetic diversity among seabuckthorn collections. This is the first report of generation of transcriptome data providing information about genes involved in managing plant abiotic stress in seabuckthorn, a plant known for its enormous medicinal and ecological value. Copyright © 2011 Elsevier Masson SAS. All rights reserved.

  16. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  17. Diversity and community composition of methanogenic archaea in the rumen of Scottish upland sheep assessed by different methods.

    PubMed

    Snelling, Timothy J; Genç, Buğra; McKain, Nest; Watson, Mick; Waters, Sinéad M; Creevey, Christopher J; Wallace, R John

    2014-01-01

    Ruminal archaeomes of two mature sheep grazing in the Scottish uplands were analysed by different sequencing and analysis methods in order to compare the apparent archaeal communities. All methods revealed that the majority of methanogens belonged to the Methanobacteriales order containing the Methanobrevibacter, Methanosphaera and Methanobacteria genera. Sanger sequenced 1.3 kb 16S rRNA gene amplicons identified the main species of Methanobrevibacter present to be a SGMT Clade member Mbb. millerae (≥ 91% of OTUs); Methanosphaera comprised the remainder of the OTUs. The primers did not amplify ruminal Thermoplasmatales-related 16S rRNA genes. Illumina sequenced V6-V8 16S rRNA gene amplicons identified similar Methanobrevibacter spp. and Methanosphaera clades and also identified the Thermoplasmatales-related order as 13% of total archaea. Unusually, both methods concluded that Mbb. ruminantium and relatives from the same clade (RO) were almost absent. Sequences mapping to rumen 16S rRNA and mcrA gene references were extracted from Illumina metagenome data. Mapping of the metagenome data to 16S rRNA gene references produced taxonomic identification to Order level including 2-3% Thermoplasmatales, but was unable to discriminate to species level. Mapping of the metagenome data to mcrA gene references resolved 69% to unclassified Methanobacteriales. Only 30% of sequences were assigned to species level clades: of the sequences assigned to Methanobrevibacter, most mapped to SGMT (16%) and RO (10%) clades. The Sanger 16S amplicon and Illumina metagenome mcrA analyses showed similar species richness (Chao1 Index 19-35), while Illumina metagenome and amplicon 16S rRNA analysis gave lower richness estimates (10-18). The values of the Shannon Index were low in all methods, indicating low richness and uneven species distribution. Thus, although much information may be extracted from the other methods, Illumina amplicon sequencing of the V6-V8 16S rRNA gene would be the method of choice for studying rumen archaeal communities.

  18. Correlation between protein sequence similarity and x-ray diffraction quality in the protein data bank.

    PubMed

    Lu, Hui-Meng; Yin, Da-Chuan; Ye, Ya-Jing; Luo, Hui-Min; Geng, Li-Qiang; Li, Hai-Sheng; Guo, Wei-Hong; Shang, Peng

    2009-01-01

    As the most widely utilized technique to determine the 3-dimensional structure of protein molecules, X-ray crystallography can provide structure of the highest resolution among the developed techniques. The resolution obtained via X-ray crystallography is known to be influenced by many factors, such as the crystal quality, diffraction techniques, and X-ray sources, etc. In this paper, the authors found that the protein sequence could also be one of the factors. We extracted information of the resolution and the sequence of proteins from the Protein Data Bank (PDB), classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the best resolution obtained. The results showed that there was a pronounced correlation between the sequence similarity and the obtained resolution. These results indicate that protein structure itself is one variable that may affect resolution when X-ray crystallography is used.

  19. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    PubMed Central

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  20. Identification of genes associated with reproduction in the Mud Crab (Scylla olivacea) and their differential expression following serotonin stimulation.

    PubMed

    Kornthong, Napamanee; Cummins, Scott F; Chotwiwatthanakun, Charoonroj; Khornchatri, Kanjana; Engsusophon, Attakorn; Hanna, Peter J; Sobhon, Prasert

    2014-01-01

    The central nervous system (CNS) is often intimately involved in reproduction control and is therefore a target organ for transcriptomic investigations to identify reproduction-associated genes. In this study, 454 transcriptome sequencing was performed on pooled brain and ventral nerve cord of the female mud crab (Scylla olivacea) following serotonin injection (5 µg/g BW). A total of 197,468 sequence reads was obtained with an average length of 828 bp. Approximately 38.7% of 2,183 isotigs matched with significant similarity (E value < 1e-4) to sequences within the Genbank non-redundant (nr) database, with most significant matches being to crustacean and insect sequences. Approximately 32 putative neuropeptide genes were identified from nonmatching blast sequences. In addition, we identified full-length transcripts for crustacean reproductive-related genes, namely farnesoic acid o-methyltransferase (FAMeT), estrogen sulfotransferase (ESULT) and prostaglandin F synthase (PGFS). Following serotonin injection, which would normally initiate reproductive processes, we found up-regulation of FAMeT, ESULT and PGFS expression in the female CNS and ovary. Our data here provides an invaluable new resource for understanding the molecular role of the CNS on reproduction in S. olivacea.

  1. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

    PubMed

    Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

    2003-07-04

    The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.

  2. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

  3. Hsp90 and environmental stress transform the adaptive value of natural genetic variation.

    PubMed

    Jarosz, Daniel F; Lindquist, Susan

    2010-12-24

    How can species remain unaltered for long periods yet also undergo rapid diversification? By linking genetic variation to phenotypic variation via environmental stress, the Hsp90 protein-folding reservoir might promote both stasis and change. However, the nature and adaptive value of Hsp90-contingent traits remain uncertain. In ecologically and genetically diverse yeasts, we find such traits to be both common and frequently adaptive. Most are based on preexisting variation, with causative polymorphisms occurring in coding and regulatory sequences alike. A common temperature stress alters phenotypes similarly. Both selective inhibition of Hsp90 and temperature stress increase correlations between genotype and phenotype. This system broadly determines the adaptive value of standing genetic variation and, in so doing, has influenced the evolution of current genomes.

  4. Dual-Layer Video Encryption using RSA Algorithm

    NASA Astrophysics Data System (ADS)

    Chadha, Aman; Mallik, Sushmit; Chadha, Ankit; Johar, Ravdeep; Mani Roja, M.

    2015-04-01

    This paper proposes a video encryption algorithm using RSA and Pseudo Noise (PN) sequence, aimed at applications requiring sensitive video information transfers. The system is primarily designed to work with files encoded using the Audio Video Interleaved (AVI) codec, although it can be easily ported for use with Moving Picture Experts Group (MPEG) encoded files. The audio and video components of the source separately undergo two layers of encryption to ensure a reasonable level of security. Encryption of the video component involves applying the RSA algorithm followed by the PN-based encryption. Similarly, the audio component is first encrypted using PN and further subjected to encryption using the Discrete Cosine Transform. Combining these techniques, an efficient system, invulnerable to security breaches and attacks with favorable values of parameters such as encryption/decryption speed, encryption/decryption ratio and visual degradation; has been put forth. For applications requiring encryption of sensitive data wherein stringent security requirements are of prime concern, the system is found to yield negligible similarities in visual perception between the original and the encrypted video sequence. For applications wherein visual similarity is not of major concern, we limit the encryption task to a single level of encryption which is accomplished by using RSA, thereby quickening the encryption process. Although some similarity between the original and encrypted video is observed in this case, it is not enough to comprehend the happenings in the video.

  5. A new method to improve network topological similarity search: applied to fold recognition

    PubMed Central

    Lhota, John; Hauptman, Ruth; Hart, Thomas; Ng, Clara; Xie, Lei

    2015-01-01

    Motivation: Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework—Enrichment of Network Topological Similarity (ENTS)—to improve the performance of large scale similarity searches in bioinformatics. Results: We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. Availability and implementation: Source code freely available upon request Contact: lxie@iscb.org PMID:25717198

  6. Recognizing the Albian-Cenomanian (OAE1d) sequence boundary using plant carbon isotopes: Dakota Formation, Western Interior Basin, USA

    USGS Publications Warehouse

    Grocke, D.R.; Ludvigson, Greg A.; Witzke, B.L.; Robinson, S.A.; Joeckel, R.M.; Ufnar, David F.; Ravn, R.L.

    2006-01-01

    Analysis of bulk sedimentary organic matter and charcoal from an Albian-Cenomanian fluvial-estuarine succession (Dakota Formation) at Rose Creek Pit (RCP), Nebraska, reveals a negative excursion of ???3???, in late Albian strata. Overlying Cenomanian strata have ??13C values of -24???, to -23???, that are similar to pre-excursion values. The absence of an intervening positive excursion (as exists in marine records of the Albian-Cenomanian boundary) likely results from a depositional hiatus. The corresponding positive ??13C event and proposed depositional hiatus are concordant with a regionally identified sequence boundary in the Dakota Formation (D2), as well as a major regressive phase throughout the globe at the Albian-Cenomanian boundary. Data from RCP confirm suggestions that some positive carbon-isotope excursions in the geologic record are coincident with regressive sea-level phases. We estimate using isotopic correlation that the D2 sequence boundary at RCP was on the order of 0.5 m.y. in duration. Therefore, interpretations of isotopic events and associated environmental phenomena, such as oceanic anoxic events, in the shallow-marine and terrestrial record may be influenced by stratigraphic incompleteness. Further investigation of terrestrial ??13C records may be useful in recognizing and constraining sea-level changes in the geologic record. ?? 2006 Geological Society of America.

  7. Phylogenomic relationship of feijoa (Acca sellowiana (O.Berg) Burret) with other Myrtaceae based on complete chloroplast genome sequences.

    PubMed

    Machado, Lilian de Oliveira; Vieira, Leila do Nascimento; Stefenon, Valdir Marcos; Oliveira Pedrosa, Fábio de; Souza, Emanuel Maltempi de; Guerra, Miguel Pedro; Nodari, Rubens Onofre

    2017-04-01

    Given their distribution, importance, and richness, Myrtaceae species comprise a model system for studying the evolution of tropical plant diversity. In addition, chloroplast (cp) genome sequencing is an efficient tool for phylogenetic relationship studies. Feijoa [Acca sellowiana (O. Berg) Burret; CN: pineapple-guava] is a Myrtaceae species that occurs naturally in southern Brazil and northern Uruguay. Feijoa is known for its exquisite perfume and flavorful fruits, pharmacological properties, ornamental value and increasing economic relevance. In the present work, we reported the complete cp genome of feijoa. The feijoa cp genome is a circular molecule of 159,370 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC 88,028 bp) and a Small Single Copy region (SSC 18,598 bp) separated by Inverted Repeat regions (IRs 26,372 bp). The genome structure, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. When compared to other cp genome sequences of Myrtaceae, feijoa showed closest relationship with pitanga (Eugenia uniflora L.). Furthermore, a comparison of pitanga synonymous (Ks) and nonsynonymous (Ka) substitution rates revealed extremely low values. Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of three Myrtoideae clades.

  8. Evaluation and Selection of Best Priority Sequencing Rule in Job Shop Scheduling using Hybrid MCDM Technique

    NASA Astrophysics Data System (ADS)

    Kiran Kumar, Kalla; Nagaraju, Dega; Gayathri, S.; Narayanan, S.

    2017-05-01

    Priority Sequencing Rules provide the guidance for the order in which the jobs are to be processed at a workstation. The application of different priority rules in job shop scheduling gives different order of scheduling. More experimentation needs to be conducted before a final choice is made to know the best priority sequencing rule. Hence, a comprehensive method of selecting the right choice is essential in managerial decision making perspective. This paper considers seven different priority sequencing rules in job shop scheduling. For evaluation and selection of the best priority sequencing rule, a set of eight criteria are considered. The aim of this work is to demonstrate the methodology of evaluating and selecting the best priority sequencing rule by using hybrid multi criteria decision making technique (MCDM), i.e., analytical hierarchy process (AHP) with technique for order preference by similarity to ideal solution (TOPSIS). The criteria weights are calculated by using AHP whereas the relative closeness values of all priority sequencing rules are computed based on TOPSIS with the help of data acquired from the shop floor of a manufacturing firm. Finally, from the findings of this work, the priority sequencing rules are ranked from most important to least important. The comprehensive methodology presented in this paper is very much essential for the management of a workstation to choose the best priority sequencing rule among the available alternatives for processing the jobs with maximum benefit.

  9. Transcriptomic analysis of Siberian ginseng (Eleutherococcus senticosus) to discover genes involved in saponin biosynthesis.

    PubMed

    Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui

    2015-03-14

    Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.

  10. Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hong, R. L., Hamaguchi, L., Busch, M. A., and Weigel, D.

    2003-06-01

    OAK-B135 In Arabidopsis thaliana, cis-regulatory sequences of the floral homeotic gene AGAMOUS (AG) are located in the second intron. This 3 kb intron contains binding sites for two direct activators of AG, LEAFY (LFY) and WUSCHEL (WUS), along with other putative regulatory elements. We have used phylogenetic footprinting and the related technique of phylogenetic shadowing to identify putative cis-regulatory elements in this intron. Among 29 Brassicaceae, several other motifs, but not the LFY and WUS binding sites previously identified, are largely invariant. Using reporter gene analyses, we tested six of these motifs and found that they are all functionally importantmore » for activity of AG regulatory sequences in A. thaliana. Although there is little obvious sequence similarity outside the Brassicaceae, the intron from cucumber AG has at least partial activity in A. thaliana. Our studies underscore the value of the comparative approach as a tool that complements gene-by-gene promoter dissection, but also highlight that sequence-based studies alone are insufficient for a complete identification of cis-regulatory sites.« less

  11. Effects of tonal language background on tests of temporal sequencing in children.

    PubMed

    Mukari, Siti Zamratol-Mai S; Yu, Xuan; Ishak, Wan Syafira; Mazlan, Rafidah

    2015-01-01

    The aims of the present study were to determine the effects of language background on the performance of the pitch pattern sequence test (PPST) and duration pattern sequence test (DPST). As temporal order sequencing may be affected by age and working memory, these factors were also studied. Performance of tonal and non-tonal language speakers on PPST and DPST were compared. Twenty-eight native Mandarin (tonal language) speakers and twenty-nine native Malay (non-tonal language) speakers between seven to nine years old participated in this study. The results revealed that relative to native Malay speakers, native Mandarin speakers demonstrated better scores on the PPST in both humming and verbal labeling responses. However, a similar language effect was not apparent in the DPST. An age effect was only significant in the PPST (verbal labeling). Finally, no significant effect of working memory was found on the PPST and the DPST. These findings suggest that the PPST is affected by tonal language background, and highlight the importance of developing different normative values for tonal and non-tonal language speakers.

  12. Transcriptome analysis of eyestalk and hemocytes in the ridgetail white prawn Exopalaemon carinicauda: assembly, annotation and marker discovery.

    PubMed

    Li, Jitao; Li, Jian; Chen, Ping; Liu, Ping; He, Yuying

    2015-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of major economic mariculture species in eastern China. The deficiency of genomic and transcriptomic data is becoming the bottleneck of further researches on its good traits. In the present study, 454 pyrosequencing was undertaken to investigate the transcriptome profiles of E. carinicauda. A collection of 1,028,710 sequence reads (459.59 Mb) obtained from cDNA prepared from eyestalk and hemocytes was assembled into 162,056 expressed sequence tags (ESTs). Of these, 29.88 % of 48,428 contigs and 70.12 % of 113,628 singlets possessed high similarities to sequences in the GenBank non-redundant database, with most significant (E value <1e(-10)) unigenes matches occurring with crustacean and insect sequences. KEGG analysis of unigenes identified putative members of biological pathways related to growth and immunity. In addition, we obtained a total of putative 125,112 SNPs and 13,467 microsatellites. These results will contribute to the understanding of the genome makeup and provide useful information for future functional genomic research in E. carinicauda.

  13. Draft genome sequence of type strain HBR26T and description of Rhizobium aethiopicum sp. nov.

    DOE PAGES

    Aserse, Aregu Amsalu; Woyke, Tanja; Kyrpides, Nikos C.; ...

    2017-01-26

    Rhizobium aethiopicum sp. nov. is a newly proposed species within the genus Rhizobium. This species includes six rhizobial strains; which were isolated from root nodules of the legume plant Phaseolus vulgaris growing in soils of Ethiopia. The species fixes nitrogen effectively in symbiosis with the host plant P. vulgaris, and is composed of aerobic, Gram-negative staining, rod-shaped bacteria. The genome of type strain HBR26 T of R. aethiopicum sp. nov. was one of the rhizobial genomes sequenced as a part of the DOE JGI 2014 Genomic Encyclopedia project designed for soil and plant-associated and newly described type strains. The genomemore » sequence is arranged in 62 scaffolds and consists of 6,557,588 bp length, with a 61% G + C content and 6221 protein-coding and 86 RNAs genes. The genome of HBR26 T contains repABC genes (plasmid replication genes) homologous to the genes found in five differen t Rhizobium etli CFN42 T plasmids, suggesting that HBR26 T may have five additional replicons other than the chromosome. In the genome of HBR26 T , the nodulation genes nodB, nodC, nodS, nodI, nodJ and nodD are located in the same module, and organized in a similar way as nod genes found in the genome of other known common bean-nodulating rhizobial species. nodA gene is found in a different scaffold, but it is also very similar to nodA genes of other bean-nodulating rhizobial strains. Though HBR26 T is distinct on the phylogenetic tree and based on ANI analysis (the highest value 90.2% ANI with CFN42 T ) from other bean-nodulating species, these nod genes and most nitrogen-fixing genes found in the genome of HBR26 T share high identity with the corresponding genes of known bean-nodulating rhizobial species (96-100% identity). This suggests that symbiotic genes might be shared between bean-nodulating rhizobia through horizontal gene transfer. R. aethiopicum sp. nov. was grouped into the genus Rhizobium but was distinct from all recognized species of that genus by phylogenetic analyses of combined sequences of the housekeeping genes recA and glnII. The closest reference type strains for HBR26 T were R. etli CFN42 T (94% similarity of the combined recA and glnII sequences) and Rhizobium bangladeshense BLR175 T (93%). Genomic ANI calculation based on protein-coding genes also revealed that the closest reference strains were R. bangladeshense BLR175 T and R. etli CFN42 T with ANI values 91.8 and 90.2%, respectively. Nevertheless, the ANI values between HBR26 T and BLR175 T or CFN42 T are far lower than the cutoff value of ANI ( > = 96%) between strains in the same species, confirming that HBR26 T belongs to a novel species. Thus, on the basis of phylogenetic, comparative genomic analyses and ANI results, we formally propose the creation of R. aethiopicum sp. nov. with strain HBR26 T (=HAMBI 3550 T =LMG 29711 T ) as the type strain. The genome assembly and annotation data is deposited in the DOE JGI portal and also available at European Nucleotide Archive under accession numbers FMAJ01000001-FMAJ01000062.« less

  14. Draft genome sequence of type strain HBR26T and description of Rhizobium aethiopicum sp. nov.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aserse, Aregu Amsalu; Woyke, Tanja; Kyrpides, Nikos C.

    Rhizobium aethiopicum sp. nov. is a newly proposed species within the genus Rhizobium. This species includes six rhizobial strains; which were isolated from root nodules of the legume plant Phaseolus vulgaris growing in soils of Ethiopia. The species fixes nitrogen effectively in symbiosis with the host plant P. vulgaris, and is composed of aerobic, Gram-negative staining, rod-shaped bacteria. The genome of type strain HBR26 T of R. aethiopicum sp. nov. was one of the rhizobial genomes sequenced as a part of the DOE JGI 2014 Genomic Encyclopedia project designed for soil and plant-associated and newly described type strains. The genomemore » sequence is arranged in 62 scaffolds and consists of 6,557,588 bp length, with a 61% G + C content and 6221 protein-coding and 86 RNAs genes. The genome of HBR26 T contains repABC genes (plasmid replication genes) homologous to the genes found in five differen t Rhizobium etli CFN42 T plasmids, suggesting that HBR26 T may have five additional replicons other than the chromosome. In the genome of HBR26 T , the nodulation genes nodB, nodC, nodS, nodI, nodJ and nodD are located in the same module, and organized in a similar way as nod genes found in the genome of other known common bean-nodulating rhizobial species. nodA gene is found in a different scaffold, but it is also very similar to nodA genes of other bean-nodulating rhizobial strains. Though HBR26 T is distinct on the phylogenetic tree and based on ANI analysis (the highest value 90.2% ANI with CFN42 T ) from other bean-nodulating species, these nod genes and most nitrogen-fixing genes found in the genome of HBR26 T share high identity with the corresponding genes of known bean-nodulating rhizobial species (96-100% identity). This suggests that symbiotic genes might be shared between bean-nodulating rhizobia through horizontal gene transfer. R. aethiopicum sp. nov. was grouped into the genus Rhizobium but was distinct from all recognized species of that genus by phylogenetic analyses of combined sequences of the housekeeping genes recA and glnII. The closest reference type strains for HBR26 T were R. etli CFN42 T (94% similarity of the combined recA and glnII sequences) and Rhizobium bangladeshense BLR175 T (93%). Genomic ANI calculation based on protein-coding genes also revealed that the closest reference strains were R. bangladeshense BLR175 T and R. etli CFN42 T with ANI values 91.8 and 90.2%, respectively. Nevertheless, the ANI values between HBR26 T and BLR175 T or CFN42 T are far lower than the cutoff value of ANI ( > = 96%) between strains in the same species, confirming that HBR26 T belongs to a novel species. Thus, on the basis of phylogenetic, comparative genomic analyses and ANI results, we formally propose the creation of R. aethiopicum sp. nov. with strain HBR26 T (=HAMBI 3550 T =LMG 29711 T ) as the type strain. The genome assembly and annotation data is deposited in the DOE JGI portal and also available at European Nucleotide Archive under accession numbers FMAJ01000001-FMAJ01000062.« less

  15. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  16. Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

    PubMed Central

    Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

    2013-01-01

    The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218

  17. Some special values of vertices of trees on the suborbital graphs

    NASA Astrophysics Data System (ADS)

    Deǧer, A. H.; Akbaba, Ü.

    2018-01-01

    In the present study, the action of a congruence subgroup of S L(2, Z) on ℚ ^ is examined. From this action and its properties, vertices of paths of minimal length on the suborbital graph Fu,N give rise to some special sequence values, that are alternate sequences such as identity, Fibonacci and Lucas sequences. These types of vertices also give rise to special continued fractions, hence from recurrence relations for continued fractions, values of these vertices and values of special sequences were associated.

  18. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  19. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  20. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment

    PubMed Central

    2013-01-01

    Background Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. Results In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Conclusion Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA. PMID:24564200

  1. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to identify conserved regions fast or even interactively using a standard PC. Our method has many potential applications such as finding characteristic signature sequences for families of organisms and studying conserved and variable regions in, for example, 16S rRNA.

  2. Image quality assessment of silent T2 PROPELLER sequence for brain imaging in infants.

    PubMed

    Kim, Hyun Gi; Choi, Jin Wook; Yoon, Soo Han; Lee, Sieun

    2018-02-01

    Infants are vulnerable to high acoustic noise. Acoustic noise generated by MR scanning can be reduced by a silent sequence. The purpose of this study is to compare the image quality of the conventional and silent T2 PROPELLER sequences for brain imaging in infants. A total of 36 scans were acquired from 24 infants using a 3 T MR scanner. Each patient underwent both conventional and silent T2 PROPELLER sequences. Acoustic noise level was measured. Quantitative and qualitative assessments were performed with the images taken with each sequence. The sound pressure level of the conventional T2 PROPELLER imaging sequence was 92.1 dB and that of the silent T2 PROPELLER imaging sequence was 73.3 dB (reduction of 20%). On quantitative assessment, the two sequences (conventional vs silent T2 PROPELLER) did not show significant difference in relative contrast (0.069 vs 0.068, p value = 0.536) and signal-to-noise ratio (75.4 vs 114.8, p value = 0.098). Qualitative assessment of overall image quality (p value = 0.572), grey-white differentiation (p value = 0.986), shunt-related artefact (p value > 0.999), motion artefact (p value = 0.801) and myelination degree in different brain regions (p values ≥ 0.092) did not show significant difference between the two sequences. The silent T2 PROPELLER sequence reduces acoustic noise and generated comparable image quality to that of the conventional sequence. Advances in knowledge: This is the first report to compare silent T2 PROPELLER images with that of conventional T2 PROPELLER images in children.

  3. Purification, characterization and sequence analysis of Omp50,a new porin isolated from Campylobacter jejuni.

    PubMed Central

    Bolla, J M; Dé, E; Dorez, A; Pagès, J M

    2000-01-01

    A novel pore-forming protein identified in Campylobacter was purified by ion-exchange chromatography and named Omp50 according to both its molecular mass and its outer membrane localization. We observed a pore-forming ability of Omp50 after re-incorporation into artificial membranes. The protein induced cation-selective channels with major conductance values of 50-60 pS in 1 M NaCl. N-terminal sequencing allowed us to identify the predicted coding sequence Cj1170c from the Campylobacter jejuni genome database as the corresponding gene in the NCTC 11168 genome sequence. The gene, designated omp50, consists of a 1425 bp open reading frame encoding a deduced 453-amino acid protein with a calculated pI of 5.81 and a molecular mass of 51169.2 Da. The protein possessed a 20-amino acid leader sequence. No significant similarity was found between Omp50 and porin protein sequences already determined. Moreover, the protein showed only weak sequence identity with the major outer-membrane protein (MOMP) of Campylobacter, correlating with the absence of antigenic cross-reactivity between these two proteins. Omp50 is expressed in C. jejuni and Campylobacter lari but not in Campylobacter coli. The gene, however, was detected in all three species by PCR. According to its conformation and functional properties, the protein would belong to the family of outer-membrane monomeric porins. PMID:11104668

  4. How good are indirect tests at detecting recombination in human mtDNA?

    PubMed

    White, Daniel James; Bryant, David; Gemmell, Neil John

    2013-07-08

    Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D' and r(2), Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ(2)) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7-70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed.

  5. Serratia aquatilis sp. nov., isolated from drinking water systems.

    PubMed

    Kämpfer, Peter; Glaeser, Stefanie P

    2016-01-01

    A cream-white-pigmented, oxidase-negative bacterium (strain 2015-2462-01T), isolated from a drinking water system, was investigated in detail to determine its taxonomic position. Cells of the isolate were rod-shaped and stained Gram-negative. A comparison of the 16S rRNA gene sequence of strain 2015-2462-01T with sequences of the type strains of closely related species of the genus Serratia revealed highest similarity to Serratia fonticola (98.4 %), Serratia proteamaculans (97.8 %), Serratia liquefaciens and Serratia grimesii (both 97.7 %). 16S rRNA gene sequence similarities to all other Serratia species were below 97.4 %. Multilocus sequence analysis (MLSA) on the basis of concatenated partial gyrB, rpoB, infB and atpD gene sequences showed a clear distinction of strain 2015-2462-01T from the type strains of the closest related Serratia species. The fatty acid profile of the strain consisted of C16 : 1 ω7c, C16 : 0; C14 : 0 and C14 : 0 3-OH/iso-C16 : 1 I as major components. DNA-DNA hybridizations between 2015-2462-01T and S. fonticola ATCC 29844T resulted in a relatedness value of 27 % (reciprocal 20 %). This DNA-DNA hybridization result in combination with the MLSA results and the differential biochemical properties indicated that strain 2015-2462-01T represents a novel species of the genus Serratia, for which the name Serratia aquatilis sp. nov. is proposed. The type strain is 2015-2462-01T ( = LMG 29119T = CCM 8626T).

  6. How Good Are Indirect Tests at Detecting Recombination in Human mtDNA?

    PubMed Central

    White, Daniel James; Bryant, David; Gemmell, Neil John

    2013-01-01

    Empirical proof of human mitochondrial DNA (mtDNA) recombination in somatic tissues was obtained in 2004; however, a lack of irrefutable evidence exists for recombination in human mtDNA at the population level. Our inability to demonstrate convincingly a signal of recombination in population data sets of human mtDNA sequence may be due, in part, to the ineffectiveness of current indirect tests. Previously, we tested some well-established indirect tests of recombination (linkage disequilibrium vs. distance using D′ and r2, Homoplasy Test, Pairwise Homoplasy Index, Neighborhood Similarity Score, and Max χ2) on sequence data derived from the only empirically confirmed case of human mtDNA recombination thus far and demonstrated that some methods were unable to detect recombination. Here, we assess the performance of these six well-established tests and explore what characteristics specific to human mtDNA sequence may affect their efficacy by simulating sequence under various parameters with levels of recombination (ρ) that vary around an empirically derived estimate for human mtDNA (population parameter ρ = 5.492). No test performed infallibly under any of our scenarios, and error rates varied across tests, whereas detection rates increased substantially with ρ values > 5.492. Under a model of evolution that incorporates parameters specific to human mtDNA, including rate heterogeneity, population expansion, and ρ = 5.492, successful detection rates are limited to a range of 7−70% across tests with an acceptable level of false-positive results: the neighborhood similarity score incompatibility test performed best overall under these parameters. Population growth seems to have the greatest impact on recombination detection probabilities across all models tested, likely due to its impact on sequence diversity. The implications of our findings on our current understanding of mtDNA recombination in humans are discussed. PMID:23665874

  7. Abundance of Dioxygenase Genes Similar to Ralstonia sp. Strain U2 nagAc Is Correlated with Naphthalene Concentrations in Coal Tar-Contaminated Freshwater Sediments

    PubMed Central

    Dionisi, Hebe M.; Chewning, Christopher S.; Morgan, Katherine H.; Menn, Fu-Min; Easter, James P.; Sayler, Gary S.

    2004-01-01

    We designed a real-time PCR assay able to recognize dioxygenase large-subunit gene sequences with more than 90% similarity to the Ralstonia sp. strain U2 nagAc gene (nagAc-like gene sequences) in order to study the importance of organisms carrying these genes in the biodegradation of naphthalene. Sequencing of PCR products indicated that this real-time PCR assay was specific and able to detect a variety of nagAc-like gene sequences. One to 100 ng of contaminated-sediment total DNA in 25-μl reaction mixtures produced an amplification efficiency of 0.97 without evident PCR inhibition. The assay was applied to surficial freshwater sediment samples obtained in or in close proximity to a coal tar-contaminated Superfund site. Naphthalene concentrations in the analyzed samples varied between 0.18 and 106 mg/kg of dry weight sediment. The assay for nagAc-like sequences indicated the presence of (4.1 ± 0.7) × 103 to (2.9 ± 0.3) × 105 copies of nagAc-like dioxygenase genes per μg of DNA extracted from sediment samples. These values corresponded to (1.2 ± 0.6) × 105 to (5.4 ± 0.4) × 107 copies of this target per g of dry weight sediment when losses of DNA during extraction were taken into account. There was a positive correlation between naphthalene concentrations and nagAc-like gene copies per microgram of DNA (r = 0.89) and per gram of dry weight sediment (r = 0.77). These results provide evidence of the ecological significance of organisms carrying nagAc-like genes in the biodegradation of naphthalene. PMID:15240274

  8. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  9. The mitochondrial genome of Malus domestica and the import-driven hypothesis of mitochondrial genome expansion in seed plants.

    PubMed

    Goremykin, Vadim V; Lockhart, Peter J; Viola, Roberto; Velasco, Riccardo

    2012-08-01

    Mitochondrial genomes of spermatophytes are the largest of all organellar genomes. Their large size has been attributed to various factors; however, the relative contribution of these factors to mitochondrial DNA (mtDNA) expansion remains undetermined. We estimated their relative contribution in Malus domestica (apple). The mitochondrial genome of apple has a size of 396 947 bp and a one to nine ratio of coding to non-coding DNA, close to the corresponding average values for angiosperms. We determined that 71.5% of the apple mtDNA sequence was highly similar to sequences of its nuclear DNA. Using nuclear gene exons, nuclear transposable elements and chloroplast DNA as markers of promiscuous DNA content in mtDNA, we estimated that approximately 20% of the apple mtDNA consisted of DNA sequences imported from other cell compartments, mostly from the nucleus. Similar marker-based estimates of promiscuous DNA content in the mitochondrial genomes of other species ranged between 21.2 and 25.3% of the total mtDNA length for grape, between 23.1 and 38.6% for rice, and between 47.1 and 78.4% for maize. All these estimates are conservative, because they underestimate the import of non-functional DNA. We propose that the import of promiscuous DNA is a core mechanism for mtDNA size expansion in seed plants. In apple, maize and grape this mechanism contributed far more to genome expansion than did homologous recombination. In rice the estimated contribution of both mechanisms was found to be similar. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  10. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  11. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  12. Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia

    PubMed Central

    Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

    2009-01-01

    The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL. PMID:18640719

  13. Computational identification of CDR3 sequence archetypes among immunoglobulin sequences in chronic lymphocytic leukemia.

    PubMed

    Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

    2009-03-01

    The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL.

  14. Family values in the age of genomics: comparative analyses of temperate bacteriophage HK022.

    PubMed

    Weisberg, R A; Gottesmann, M E; Hendrix, R W; Little, J W

    1999-01-01

    HK022 is a temperate coliphage related to phage lambda. Its chromosome has been completely sequenced, and several aspects of its life cycle have been intensively studied. In the overall arrangement, expression, and function of most of its genes, HK022 broadly resembles lambda and other members of the lambda family. Upon closer view, significant differences emerge. The differences reveal alternative strategies used by related phages to cope with similar problems and illuminate previously unknown regulatory and structural motifs. HK022 prophages protect lysogens from superinfection by producing a sequence-specific RNA binding protein that prematurely terminates nascent transcripts of infecting phage. It uses a novel RNA-based mechanism to antiterminate its own early transcription. The HK022 protein shell is strengthened by a complex pattern of covalent subunit interlinking to form a unitary structure that resembles chain-mail armour. Its integrase and repressor proteins are similar to those of lambda, but the differences provide insights into the evolution of biological specificity and the elements needed for construction of a stable genetic switch.

  15. An approach to large scale identification of non-obvious structural similarities between proteins

    PubMed Central

    Cherkasov, Artem; Jones, Steven JM

    2004-01-01

    Background A new sequence independent bioinformatics approach allowing genome-wide search for proteins with similar three dimensional structures has been developed. By utilizing the numerical output of the sequence threading it establishes putative non-obvious structural similarities between proteins. When applied to the testing set of proteins with known three dimensional structures the developed approach was able to recognize structurally similar proteins with high accuracy. Results The method has been developed to identify pathogenic proteins with low sequence identity and high structural similarity to host analogues. Such protein structure relationships would be hypothesized to arise through convergent evolution or through ancient horizontal gene transfer events, now undetectable using current sequence alignment techniques. The pathogen proteins, which could mimic or interfere with host activities, would represent candidate virulence factors. The developed approach utilizes the numerical outputs from the sequence-structure threading. It identifies the potential structural similarity between a pair of proteins by correlating the threading scores of the corresponding two primary sequences against the library of the standard folds. This approach allowed up to 64% sensitivity and 99.9% specificity in distinguishing protein pairs with high structural similarity. Conclusion Preliminary results obtained by comparison of the genomes of Homo sapiens and several strains of Chlamydia trachomatis have demonstrated the potential usefulness of the method in the identification of bacterial proteins with known or potential roles in virulence. PMID:15147578

  16. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    PubMed Central

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  17. [Parametrial infiltration of cervix carcinoma: diagnostic value of contrast-enhanced fat-suppressed T1-weighted SE sequences at 1.5 tesla].

    PubMed

    Scheidler, J; Heuck, A; Wencke, K; Kimmig, R; Müller-Lisse, U; Reiser, M

    1997-04-01

    To determine whether contrast-enhanced and fat-suppressed sequences contribute to the MR imaging diagnosis of parametrial invasion. 21 patients with carcinoma of the cervix were prospectively examined with a phased-array coil and a 1.5T MR-scanner using the following sequences: transverse T2-weighted turbo spin echo (T2-TSE), T1-weighted spin echo (T1-SE) and fat suppressed T1-weighted SE sequences before and after Gd-DTPA. The sequences were evaluated separately for the presence of parametrial invasion. Image quality and diagnostic confidence were classified on a scale of 0-10 (nondiagnostic-excellent). Findings were compared to the results of the pathohistological examination. Sensitivity, specificity and diagnostic accuracy were highest for T2-TSE sequences (100%, 79% and 86%, respectively). Contrast-enhanced T1-SE sequences with fat-suppression (71%, 79%, and 76%) showed no improvement compared to T2-TSE. Unenhanced fat-suppressed T1-SE (100%, 30%, and 56%) and unenhanced T1-SE (100%, 7%, and 38%) as well as contrast-enhanced T1-SE (86%, 20%, and 47%) were significantly worse than T2-TSE. With similar image quality (p < 0.05) diagnostic confidence was higher on T2-TSE than on any of the other sequences (p < 0.001). Considering the cost-effectiveness of the examination, for the MR diagnosis of parametrial invasion the use of fat-suppressed contrast-enhanced sequences can be abandoned in favour of T2-weighted TSE sequences.

  18. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR.

    PubMed Central

    D'Souza, T M; Boominathan, K; Reddy, C A

    1996-01-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. PMID:8837429

  19. Rapid Diagnostics of Onboard Sequences

    NASA Technical Reports Server (NTRS)

    Starbird, Thomas W.; Morris, John R.; Shams, Khawaja S.; Maimone, Mark W.

    2012-01-01

    Keeping track of sequences onboard a spacecraft is challenging. When reviewing Event Verification Records (EVRs) of sequence executions on the Mars Exploration Rover (MER), operators often found themselves wondering which version of a named sequence the EVR corresponded to. The lack of this information drastically impacts the operators diagnostic capabilities as well as their situational awareness with respect to the commands the spacecraft has executed, since the EVRs do not provide argument values or explanatory comments. Having this information immediately available can be instrumental in diagnosing critical events and can significantly enhance the overall safety of the spacecraft. This software provides auditing capability that can eliminate that uncertainty while diagnosing critical conditions. Furthermore, the Restful interface provides a simple way for sequencing tools to automatically retrieve binary compiled sequence SCMFs (Space Command Message Files) on demand. It also enables developers to change the underlying database, while maintaining the same interface to the existing applications. The logging capabilities are also beneficial to operators when they are trying to recall how they solved a similar problem many days ago: this software enables automatic recovery of SCMF and RML (Robot Markup Language) sequence files directly from the command EVRs, eliminating the need for people to find and validate the corresponding sequences. To address the lack of auditing capability for sequences onboard a spacecraft during earlier missions, extensive logging support was added on the Mars Science Laboratory (MSL) sequencing server. This server is responsible for generating all MSL binary SCMFs from RML input sequences. The sequencing server logs every SCMF it generates into a MySQL database, as well as the high-level RML file and dictionary name inputs used to create the SCMF. The SCMF is then indexed by a hash value that is automatically included in all command EVRs by the onboard flight software. Second, both the binary SCMF result and the RML input file can be retrieved simply by specifying the hash to a Restful web interface. This interface enables command line tools as well as large sophisticated programs to download the SCMF and RMLs on-demand from the database, enabling a vast array of tools to be built on top of it. One such command line tool can retrieve and display RML files, or annotate a list of EVRs by interleaving them with the original sequence commands. This software has been integrated with the MSL sequencing pipeline where it will serve sequences useful in diagnostics, debugging, and situational awareness throughout the mission.

  20. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

    PubMed Central

    Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

    2007-01-01

    Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442

  1. Actinomyces haliotis sp. nov., a bacterium isolated from the gut of an abalone, Haliotis discus hannai.

    PubMed

    Hyun, Dong-Wook; Shin, Na-Ri; Kim, Min-Soo; Kim, Pil Soo; Kim, Joon Yong; Whon, Tae Woong; Bae, Jin-Woo

    2014-02-01

    A novel, Gram-staining-positive, facultatively anaerobic, non-motile and coccus-shaped bacterium, strain WL80(T), was isolated from the gut of an abalone, Haliotis discus hannai, collected from the northern coast of Jeju in Korea. Optimal growth occurred at 30 °C, pH 7-8 and with 1% (w/v) NaCl. Phylogenetic analyses based on the 16S rRNA gene sequence revealed that strain WL80(T) fell within the cluster of the genus Actinomyces, with highest sequence similarity to the type strains of Actinomyces radicidentis (98.8% similarity) and Actinomyces urogenitalis (97.0% similarity). The major cellular fatty acids were C18 : 1ω9c and C16 : 0. Menaquinone-10 (H4) was the major respiratory quinone. The genomic DNA G+C content of the isolate was 70.4 mol%. DNA-DNA hybridization values with closely related strains indicated less than 7.6% genomic relatedness. The results of physiological, biochemical, chemotaxonomic and genotypic analyses indicated that strain WL80(T) represents a novel species of the genus Actinomyces, for which the name Actinomyces haliotis sp. nov. is proposed. The type strain is WL80(T) ( = KACC 17211(T) = JCM 18848(T)).

  2. Rare variant testing across methods and thresholds using the multi-kernel sequence kernel association test (MK-SKAT).

    PubMed

    Urrutia, Eugene; Lee, Seunggeun; Maity, Arnab; Zhao, Ni; Shen, Judong; Li, Yun; Wu, Michael C

    Analysis of rare genetic variants has focused on region-based analysis wherein a subset of the variants within a genomic region is tested for association with a complex trait. Two important practical challenges have emerged. First, it is difficult to choose which test to use. Second, it is unclear which group of variants within a region should be tested. Both depend on the unknown true state of nature. Therefore, we develop the Multi-Kernel SKAT (MK-SKAT) which tests across a range of rare variant tests and groupings. Specifically, we demonstrate that several popular rare variant tests are special cases of the sequence kernel association test which compares pair-wise similarity in trait value to similarity in the rare variant genotypes between subjects as measured through a kernel function. Choosing a particular test is equivalent to choosing a kernel. Similarly, choosing which group of variants to test also reduces to choosing a kernel. Thus, MK-SKAT uses perturbation to test across a range of kernels. Simulations and real data analyses show that our framework controls type I error while maintaining high power across settings: MK-SKAT loses power when compared to the kernel for a particular scenario but has much greater power than poor choices.

  3. Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships.

    PubMed

    Gold, Nicola D; Jackson, Richard M

    2006-02-03

    The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.

  4. Association mining of dependency between time series

    NASA Astrophysics Data System (ADS)

    Hafez, Alaaeldin

    2001-03-01

    Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.

  5. Multi-parametric MRI findings of granulomatous prostatitis developing after intravesical bacillus calmette-guérin therapy.

    PubMed

    Gottlieb, Josh; Princenthal, Robert; Cohen, Martin I

    2017-07-01

    To evaluate the multi-parametric MRI (mpMRI) findings in patients with biopsy-proven granulomatous prostatitis and prior Bacillus Calmette-Guérin (BCG) exposure. MRI was performed in six patients with pathologically proven granulomatous prostatitis and a prior history of bladder cancer treated with intravesical BCG therapy. Multi-parametric prostate MRI images were recorded on a GE 750W or Philips Achieva 3.0 Tesla MRI scanner with high-resolution, small-field-of-view imaging consisting of axial T2, axial T1, coronal T2, sagittal T2, axial multiple b-value diffusion (multiple values up to 1200 or 1400), and dynamic contrast-enhanced 3D axial T1 with fat suppression sequence. Two different patterns of MR findings were observed. Five of the six patients had a low mean ADC value <1000 (decreased signal on ADC map images) and isointense signal on high-b-value imaging (b = 1200 or 1400), consistent with nonspecific granulomatous prostatitis. The other pattern seen in one of the six patients was decreased signal on the ADC map images with increased signal on the high-b-value sequence, revealing true restricted diffusion indistinguishable from aggressive prostate cancer. This patient had biopsy-confirmed acute BCG prostatitis. Our study suggests that patients with known BCG exposure and PI-RADS v2 scores ≤3, showing similar mpMRI findings as demonstrated, may not require prostate biopsy.

  6. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  7. Exploring Dance Movement Data Using Sequence Alignment Methods

    PubMed Central

    Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico

    2015-01-01

    Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435

  8. The limits of protein sequence comparison?

    PubMed Central

    Pearson, William R; Sierk, Michael L

    2010-01-01

    Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194

  9. Evaluation of an automated repetitive sequence-based PCR system for subtyping Enterobacter sakazakii.

    PubMed

    Healy, B; Mullane, N; Collin, V; Mailler, S; Iversen, C; Chatellier, S; Storrs, M; Fanning, S

    2008-07-01

    Enterobacter sakazakii is regarded as a ubiquitous organism that can be isolated from a wide range of foods and environments. Infection in at-risk infants has been epidemiologically linked to the consumption of contaminated powdered infant formula. Preventing the dissemination of this pathogen in a powdered infant formula manufacturing facility is an important step in ensuring consumer confidence in a given brand together with the protection of the health status of a vulnerable population. In this study we report the application of a repetitive sequence-based PCR typing method to subtype a previously well-characterized collection of E. sakazakii isolates of diverse origin. While both methods successfully discriminated between the collection of isolates, repetitive sequence-based PCR identified 65 types, whereas pulsed-field gel electrophoresis identified 110 types showing > or =95% similarity. The method was quick and easy to perform, and our data demonstrated the utility and value of this approach to monitor in-process contamination, which could potentially contribute to a reduction in the transmission of E. sakazakii.

  10. Lactobacillus allii sp. nov. isolated from scallion kimchi.

    PubMed

    Jung, Min Young; Lee, Se Hee; Lee, Moeun; Song, Jung Hee; Chang, Ji Yoon

    2017-12-01

    A novel strain of lactic acid bacteria, WiKim39 T , was isolated from a scallion kimchi sample consisting of fermented chili peppers and vegetables. The isolate was a Gram-positive, rod-shaped, non-motile, catalase-negative and facultatively anaerobic lactic acid bacterium. Phylogenetic analysis of the 16S rRNA gene sequence showed that strain WiKim39 T belonged to the genus Lactobacillus, and shared 97.1-98.2 % pair-wise sequence similarities with related type strains, Lactobacillus nodensis, Lactobacillus insicii, Lactobacillus versmoldensis, Lactobacillus tucceti and Lactobacillus furfuricola. The G+C content of the strain based on its genome sequence was 35.3 mol%. The ANI values between WiKim39 T and the closest relatives were lower than 80 %. Based on the phenotypic, biochemical, and phylogenetic analyses, strain WiKim39 T represents a novel species of the genus Lactobacillus, for which the name Lactobacillus allii sp. nov. is proposed. The type strain is WiKim39 T (=KCTC 21077 T =JCM 31938 T ).

  11. Lactobacillus allii sp. nov. isolated from scallion kimchi

    PubMed Central

    Jung, Min Young; Lee, Se Hee; Lee, Moeun; Song, Jung Hee; Chang, Ji Yoon

    2017-01-01

    A novel strain of lactic acid bacteria, WiKim39T, was isolated from a scallion kimchi sample consisting of fermented chili peppers and vegetables. The isolate was a Gram-positive, rod-shaped, non-motile, catalase-negative and facultatively anaerobic lactic acid bacterium. Phylogenetic analysis of the 16S rRNA gene sequence showed that strain WiKim39T belonged to the genus Lactobacillus, and shared 97.1–98.2 % pair-wise sequence similarities with related type strains, Lactobacillus nodensis, Lactobacillus insicii, Lactobacillus versmoldensis, Lactobacillus tucceti and Lactobacillus furfuricola. The G+C content of the strain based on its genome sequence was 35.3 mol%. The ANI values between WiKim39T and the closest relatives were lower than 80 %. Based on the phenotypic, biochemical, and phylogenetic analyses, strain WiKim39T represents a novel species of the genus Lactobacillus, for which the name Lactobacillus allii sp. nov. is proposed. The type strain is WiKim39T (=KCTC 21077T=JCM 31938T). PMID:29043955

  12. Towards a Logical Distinction Between Swarms and Aftershock Sequences

    NASA Astrophysics Data System (ADS)

    Gardine, M.; Burris, L.; McNutt, S.

    2007-12-01

    The distinction between swarms and aftershock sequences has, up to this point, been fairly arbitrary and non- uniform. Typically 0.5 to 1 order of magnitude difference between the mainshock and largest aftershock has been a traditional choice, but there are many exceptions. Seismologists have generally assumed that the mainshock carries most of the energy, but this is only true if it is sufficiently large compared to the size and numbers of aftershocks. Here we present a systematic division based on energy of the aftershock sequence compared to the energy of the largest event of the sequence. It is possible to calculate the amount of aftershock energy assumed to be in the sequence using the b-value of the frequency-magnitude relation with a fixed choice of magnitude separation (M-mainshock minus M-largest aftershock). Assuming that the energy of an aftershock sequence is less than the energy of the mainshock, the b-value at which the aftershock energy exceeds that of the mainshock energy determines the boundary between aftershock sequences and swarms. The amount of energy for various choices of b-value is also calculated using different values of magnitude separation. When the minimum b-value at which the sequence energy exceeds that of the largest event/mainshock is plotted against the magnitude separation, a linear trend emerges. Values plotting above this line represent swarms and values plotting below it represent aftershock sequences. This scheme has the advantage that it represents a physical quantity - energy - rather than only statistical features of earthquake distributions. As such it may be useful to help distinguish swarms from mainshock/aftershock sequences and to better determine the underlying causes of earthquake swarms.

  13. An early illness recognition framework using a temporal Smith Waterman algorithm and NLP.

    PubMed

    Hajihashemi, Zahra; Popescu, Mihail

    2013-01-01

    In this paper we propose a framework for detecting health patterns based on non-wearable sensor sequence similarity and natural language processing (NLP). In TigerPlace, an aging in place facility from Columbia, MO, we deployed 47 sensor networks together with a nursing electronic health record (EHR) system to provide early illness recognition. The proposed framework utilizes sensor sequence similarity and NLP on EHR nursing comments to automatically notify the physician when health problems are detected. The reported methodology is inspired by genomic sequence annotation using similarity algorithms such as Smith Waterman (SW). Similarly, for each sensor sequence, we associate health concepts extracted from the nursing notes using Metamap, a NLP tool provided by Unified Medical Language System (UMLS). Since sensor sequences, unlike genomics ones, have an associated time dimension we propose a temporal variant of SW (TSW) to account for time. The main challenges presented by our framework are finding the most suitable time sequence similarity and aggregation of the retrieved UMLS concepts. On a pilot dataset from three Tiger Place residents, with a total of 1685 sensor days and 626 nursing records, we obtained an average precision of 0.64 and a recall of 0.37.

  14. Clustering and visualizing similarity networks of membrane proteins.

    PubMed

    Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

    2015-08-01

    We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.

  15. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  16. β-hairpin-mediated nucleation of polyglutamine amyloid formation

    PubMed Central

    Kar, Karunakar; Hoop, Cody L.; Drombosky, Kenneth W.; Baker, Matthew A.; Kodali, Ravindra; Arduini, Irene; van der Wel, Patrick C. A.; Horne, W. Seth; Wetzel, Ronald

    2013-01-01

    The conformational preferences of polyglutamine (polyQ) sequences are of major interest because of their central importance in the expanded CAG repeat diseases that include Huntington’s disease (HD). Here we explore the response of various biophysical parameters to the introduction of β-hairpin motifs within polyQ sequences. These motifs (trpzip, disulfide, D-Pro-Gly, Coulombic attraction, L-Pro-Gly) enhance formation rates and stabilities of amyloid fibrils with degrees of effectiveness well-correlated with their known abilities to enhance β-hairpin formation in other peptides. These changes led to decreases in the critical nucleus for amyloid formation from a value of n* = 4 for a simple, unbroken Q23 sequence to approximate unitary n* values for similar length polyQs containing β-hairpin motifs. At the same time, the morphologies, secondary structures, and bioactivities of the resulting fibrils were essentially unchanged from simple polyQ aggregates. In particular, the signature pattern of SSNMR 13C Gln resonances that appears to be unique to polyQ amyloid is replicated exactly in fibrils from a β-hairpin polyQ. Importantly, while β-hairpin motifs do produce enhancements in the equilibrium constant for nucleation in aggregation reactions, these Kn* values remain quite low (~ 10−10) and there is no evidence for significant embellishment of β-structure within the monomer ensemble. The results indicate an important role for β-turns in the nucleation mechanism and structure of polyQ amyloid and have implications for the nature of the toxic species in expanded CAG repeat diseases. PMID:23353826

  17. [Ultrastructural observation on nymphal Armillifer sp. by scanning electron microscopy and phylogenetic analysis based on 18S rRNA].

    PubMed

    Li, Jian; Shi, Yun-Liang; Shi, Wei; Fang, Fang; Zhou, Qing-An; Li, Wen-Wen; He, Guo-Sheng; Huang, Wei-Yi

    2012-04-30

    To observe the ultrastructure of nymphal Armillifer sp. isolated from Macaca fascicularis by using scanning electron microscope (SEM), and analyze the phylogenetic relationships based on 18S rRNA gene sequences. The parasite samples stored in 70% alcohol were fixed by glutaraldehyde and osmium peroxide. Ultrastructural characters of those samples were observed under SEM. Amplification and sequencing of the 18S rRNA gene were performed following the extraction of total genome DNA. Sequence analysis was performed based on multiple alignment using ClustalX1.83, while phylogenetic analysis was made by Neighbor-Joining method using MEGA4.0. The nymphs were in cylindrical shape, the body slightly claviform tapering to posterior end. Abdominal annuli were gradually widened from anterior to posterior parts, the 12th-13th abdominal annuli of which were similar in width. The annuli ranged closer in the front half body, whereas in the latter part there were certain gaps between them. The circular-shaped mouth located in the middle of head ventrally. Folds were seen in inner margin of the mouth with a pair of curved hooks on both sides above it which practically disposed in a straight line. Two pairs of large sensory papillae were observed symmetrically over the last thoracic annulus of cephalothoraxs lying below the outer hook, and the first abdominal annulus was near the median ventral line. The number of abdominal annuli was 29, not including 2 incomplete terminal annuli. Rounded sensory papillae were fully distributed on the body surface, except the dorsal side of head and the ventral part of the terminal annulus. Agglomerate-like anus opening was observed at the end of ventral abdominal annuli and distinctly sub-terminal. These morphological features demonstrated that the nymphs were highly similar with that of Armillifer moniliformis Diesing, 1835. A fragment of 18SrRNA gene (1 836 bp) sequences was obtained by PCR combined with sequencing, and was registered to the GeneBank database with an accession number HM048870. The phylogenetic tree indicated that A. moniliformis, A.agkistrodon and A.armillatus were at the same clade with a bootstrap value at 95%, and A. moniliformis and A. agkistrodon were solo at a clade with a bootstrap value of 75%. The nymphs isolated from Macaca fascicularis are identified as A. moniliformis temporarily.

  18. Dispersion of the HIV-1 Epidemic in Men Who Have Sex with Men in the Netherlands: A Combined Mathematical Model and Phylogenetic Analysis.

    PubMed

    Bezemer, Daniela; Cori, Anne; Ratmann, Oliver; van Sighem, Ard; Hermanides, Hillegonda S; Dutilh, Bas E; Gras, Luuk; Rodrigues Faria, Nuno; van den Hengel, Rob; Duits, Ashley J; Reiss, Peter; de Wolf, Frank; Fraser, Christophe

    2015-11-01

    The HIV-1 subtype B epidemic amongst men who have sex with men (MSM) is resurgent in many countries despite the widespread use of effective combination antiretroviral therapy (cART). In this combined mathematical and phylogenetic study of observational data, we aimed to find out the extent to which the resurgent epidemic is the result of newly introduced strains or of growth of already circulating strains. As of November 2011, the ATHENA observational HIV cohort of all patients in care in the Netherlands since 1996 included HIV-1 subtype B polymerase sequences from 5,852 patients. Patients who were diagnosed between 1981 and 1995 were included in the cohort if they were still alive in 1996. The ten most similar sequences to each ATHENA sequence were selected from the Los Alamos HIV Sequence Database, and a phylogenetic tree was created of a total of 8,320 sequences. Large transmission clusters that included ≥10 ATHENA sequences were selected, with a local support value ≥ 0.9 and median pairwise patristic distance below the fifth percentile of distances in the whole tree. Time-varying reproduction numbers of the large MSM-majority clusters were estimated through mathematical modeling. We identified 106 large transmission clusters, including 3,061 (52%) ATHENA and 652 Los Alamos sequences. Half of the HIV sequences from MSM registered in the cohort in the Netherlands (2,128 of 4,288) were included in 91 large MSM-majority clusters. Strikingly, at least 54 (59%) of these 91 MSM-majority clusters were already circulating before 1996, when cART was introduced, and have persisted to the present. Overall, 1,226 (35%) of the 3,460 diagnoses among MSM since 1996 were found in these 54 long-standing clusters. The reproduction numbers of all large MSM-majority clusters were around the epidemic threshold value of one over the whole study period. A tendency towards higher numbers was visible in recent years, especially in the more recently introduced clusters. The mean age of MSM at diagnosis increased by 0.45 years/year within clusters, but new clusters appeared with lower mean age. Major strengths of this study are the high proportion of HIV-positive MSM with a sequence in this study and the combined application of phylogenetic and modeling approaches. Main limitations are the assumption that the sampled population is representative of the overall HIV-positive population and the assumption that the diagnosis interval distribution is similar between clusters. The resurgent HIV epidemic amongst MSM in the Netherlands is driven by several large, persistent, self-sustaining, and, in many cases, growing sub-epidemics shifting towards new generations of MSM. Many of the sub-epidemics have been present since the early epidemic, to which new sub-epidemics are being added.

  19. An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

    PubMed Central

    Du, Ruofei; Mercante, Donald; Fang, Zhide

    2013-01-01

    In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532

  20. Marinospirillum insulare sp. nov., a novel halophilic helical bacterium isolated from kusaya gravy.

    PubMed

    Satomi, M; Kimura, B; Hayashi, M; Okuzumi, M; Fujii, T

    2004-01-01

    A novel species that belongs to the genus Marinospirillum is described on the basis of phenotypic characteristics, phylogenetic analysis of 16S rRNA and gyrB gene sequences and DNA-DNA hybridization. Four strains of helical, halophilic, Gram-negative, heterotrophic bacteria were isolated from kusaya gravy, which is fermented brine that is used for the production of traditional dried fish in the Izu Islands of Japan. All of the new isolates were motile by means of bipolar tuft flagella, of small cell size, coccoid-body-forming and aerophilic; it was concluded that they belong to the same bacterial species, based on DNA-DNA hybridization values (>70% DNA relatedness). DNA G+C contents of the new strains were 42-43 mol% and they had isoprenoid quinone Q-8 as the major component. Phylogenetic analysis of 16S rRNA gene sequences indicated that the new isolates were members of the genus Marinospirillum; sequence similarity of the new isolates to Marinospirillum minutulum, Marinospirillum megaterium and Marinospirillum alkaliphilum was 98.5, 98.2 and 95.2%, respectively. Phylogenetic analysis based on the gyrB gene indicated that the new isolates had enough phylogenetic distance from M. minutulum and M. megaterium to be regarded as different species, with 84.7 and 78.7% sequence similarity, respectively. DNA-DNA hybridization showed that the new isolates had <36% DNA relatedness to M. minutulum and M. megaterium, supporting the phylogenetic conclusion. Thus, a novel species is proposed: Marinospirillum insulare sp. nov. (type strain, KT=LMG 21802T=NBRC 100033T).

  1. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum,more » Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.« less

  2. De novo assembly and characterization of the transcriptome in the desiccation-tolerant moss Syntrichia caninervis

    PubMed Central

    2014-01-01

    Background Syntrichia caninervis is a desiccation-tolerant moss and the dominant bryophyte of the Biological Soil Crusts (BSCs) found in the Mojave and Gurbantunggut deserts. Next generation high throughput sequencing technologies offer an efficient and economic choice for characterizing non-model organism transcriptomes with little or no prior molecular information available. Results In this study, we employed next generation, high-throughput, Illumina RNA-Seq to analyze the poly-(A) + mRNA from hydrated, dehydrating and desiccated S. caninervis gametophores. Approximately 58.0 million paired-end short reads were obtained and 92,240 unigenes were assembled with an average size of 493 bp, N50 value of 662 bp and a total size of 45.48 Mbp. Sequence similarity searches against five public databases (NR, Swiss-Prot, COSMOSS, KEGG and COG) found 54,125 unigenes (58.7%) with significant similarity to an existing sequence (E-value ≤ 1e-5) and could be annotated. Gene Ontology (GO) annotation assigned 24,183 unigenes to the three GO terms: Biological Process, Cellular Component or Molecular Function. GO comparison between P. patens and S. caninervis demonstrated similar sequence enrichment across all three GO categories. 29,370 deduced polypeptide sequences were assigned Pfam domain information and categorized into 4,212 Pfam domains/families. Using the PlantTFDB, 778 unigenes were predicted to be involved in the regulation of transcription and were classified into 49 transcription factor families. Annotated unigenes were mapped to the KEGG pathways and further annotated using MapMan. Comparative genomics revealed that 44% of protein families are shared in common by S. caninervis, P. patens and Arabidopsis thaliana and that 80% are shared by both moss species. Conclusions This study is one of the first comprehensive transcriptome analyses of the moss S. caninervis. Our data extends our knowledge of bryophyte transcriptomes, provides an insight to plants adapted to the arid regions of central Asia, and continues the development of S. caninervis as a model for understanding the molecular aspects of desiccation-tolerance. PMID:25086984

  3. Molecular phylogeny and species separation of five morphologically similar Holosticha-complex ciliates (Protozoa, Ciliophora) using ARDRA riboprinting and multigene sequence data

    NASA Astrophysics Data System (ADS)

    Gao, Feng; Yi, Zhenzhen; Gong, Jun; Al-Rasheid Khaled, A. S.; Song, Weibo

    2010-05-01

    To separate and redefine the ambiguous Holosticha-complex, a confusing group of hypotrichous ciliates, six strains belonging to five morphospecies of three genera, Holosticha heterofoissneri, Anteholosticha sp. pop1, Anteholosticha sp. pop2, A. manca, A. gracilis and Nothoholosticha fasciola, were analyzed using 12 restriction enzymes on the basis of amplified ribosomal DNA restriction analysis. Nine of the 12 enzymes could digest the DNA products, four ( Hinf I, Hind III, Msp I, Taq I) yielded species-specific restriction patterns, and Hind III and Taq I produced different patterns for two Anteholosticha sp. populations. Distinctly different restriction digestion haplotypes and similarity indices can be used to separate the species. The secondary structures of the five species were predicted based on the ITS2 transcripts and there were several minor differences among species, while two Anteholosticha sp. populations were identical. In addition, phylogenies based on the SSrRNA gene sequences were reconstructed using multiple algorithms, which grouped them generally into four clades, and exhibited that the genus Anteholosticha should be a convergent assemblage. The fact that Holosticha species clustered with the oligotrichs and choreotrichs, though with very low support values, indicated that the topology may be very divergent and unreliable when the number of sequence data used in the analyses is too low.

  4. Arthrobacter ruber sp. nov., isolated from glacier ice.

    PubMed

    Liu, Qing; Xin, Yu-Hua; Chen, Xiu-Ling; Liu, Hong-Can; Zhou, Yu-Guang; Chen, Wen-Xin

    2018-05-01

    A Gram-stain-positive strain designated MDB1-42 T was isolated from ice collected from Midui glacier in Tibet, PR China. Strain MDB1-42 T was catalase-positive, oxidase-negative and grew optimally at 25-28 °C and pH 7.0. Phylogenetic analysis based on 16S rRNA gene sequences revealed that MDB1-42 T represented a member of the genus Arthrobacter. The highest level of 16S rRNA gene sequence similarity (99.86 %) was found with Arthrobacter agilis NBRC 15319 T . Multilocus sequence analysis revealed low similarity of 91.93 % between MDB1-42 T and Arthrobacter agilis NBRC 15319 T . Average nucleotide identity and digital DNA-DNA hybridization values between MDB1-42 T and the most closely related strain, Arthrobacter agilis DSM 20550 T , were 81.36 and 24.5 %, respectively. The genomic DNA G+C content was 69.0 mol%. The major cellular fatty acids of MDB1-42 T were anteiso-C15 : 0 and anteiso-C17:0. The polar lipids were phosphatidylglycerol, diphosphatidylglycerol, phosphatidylinositol, one unidentified glycolipid and one unidentified lipid. The predominant menaquinone was MK-9(H2). On the basis of results obtained using a polyphasic approach, a novel species Arthrobacter ruber sp. nov. is proposed, with MDB1-42 T (=CGMCC 1.9772 T =NBRC 113088 T ) as the type strain.

  5. Characterization of the Campylobacter jejuni cryptic plasmid pTIW94 recovered from wild birds in the southeastern United States.

    PubMed

    Hiett, Kelli L; Rothrock, Michael J; Seal, Bruce S

    2013-09-01

    The complete nucleotide sequence was determined for a cryptic plasmid, pTIW94, recovered from several Campylobacter jejuni isolates from wild birds in the southeastern United States. pTIW94 is a circular molecule of 3860 nucleotides, with a G+C content (31.0%) similar to that of many Campylobacter spp. genomes. A typical origin of replication, with iteron sequences, was identified upstream of DNA sequences that demonstrated similarity to replication initiation proteins. A total of five open reading frames (ORFs) were identified; two of the five ORFs demonstrated significant similarity to plasmid pCC2228-2 found within Campylobacter coli. These two ORFs were similar to essential replication proteins RepA (100%; 26/26 aa identity) and RepB (95%; 327/346 aa identity). A third identified ORF demonstrated significant similarity (99%; 421/424 aa identity) to the MOB protein from C. coli 67-8, originally recovered from swine. The other two identified ORFs were either similar to hypothetical proteins from other Campylobacter spp., or exhibited no significant similarity to any DNA or protein sequence in the GenBank database. Promoter regions (-35 and -10 signal sites), ribosomal binding sites upstream of ORFs, and stem-loop structures were also identified within the plasmid. These results demonstrate that pTIW94 represents a previously un-reported small cryptic plasmid with unique sequences as well as highly similar sequences to other small plasmids found within Campylobacter spp., and that this cryptic plasmid is present among Campylobacter spp. recovered from different genera of wild birds. Copyright © 2013. Published by Elsevier Inc.

  6. Actinomycetospora rhizophila sp. nov., an actinomycete isolated from rhizosphere soil of a peace lily (Spathi phyllum Kochii).

    PubMed

    He, Hairong; Zhang, Yuejing; Ma, Zhaoxu; Li, Chuang; Liu, Chongxi; Zhou, Ying; Li, Lianjie; Wang, Xiangjing; Xiang, Wensheng

    2015-05-01

    A novel actinomycete, designated strain NEAU-B-8(T), was isolated from the rhizosphere soil of a peace lily (Spathi phyllum Kochii) collected from Heilongjiang province, north-east China. Key morphological and physiological characteristics as well as chemotaxonomic features of strain NEAU-B-8(T) were congruent with the description of the genus Actinomycetospora , such as the major fatty acids, the whole-cell hydrolysates, the predominant menaquinone and the phospholipid profile. The 16S rRNA gene sequence analysis revealed that strain NEAU-B-8(T) shared the highest sequence similarities with Actinomycetospora lutea JCM 17982(T) (99.3% 16S rRNA gene sequence similarity), Actinomycetospora chlora TT07I-57(T) (98.4 %), Actinomycetospora straminea IY07-55(T) (98.3%) and Actinomycetospora chibensis TT04-21(T) (98.2%); similarities to type strains of other species of this genus were lower than 98%. The phylogenetic tree based on 16S rRNA gene sequences showed that strain NEAU-B-8(T) formed a distinct branch with A. lutea JCM 17982(T) that was supported by a high bootstrap value of 97% in the neighbour-joining tree and was also recovered with the maximum-likelihood algorithm. However, the DNA-DNA relatedness between strain NEAU-B-8(T) and A. lutea JCM 17982(T) was found to be 50.6 ± 1.2%. Meanwhile, strain NEAU-B-8(T) differs from other most closely related strains in phenotypic properties, such as maximum NaCl tolerance, hydrolysis of aesculin and decomposition of urea. On the basis of the morphological, physiological, chemotaxonomic, phylogenetic and DNA-DNA hybridization data, we conclude that strain NEAU-B-8(T) represents a novel species of the genus Actinomycetospora , named Actinomycetospora rhizophila sp. nov. The type strain is NEAU-B-8(T). ( = CGMCC 4.7134(T) =DSM 46673(T)). © 2015 IUMS.

  7. Galliscardovia ingluviei gen. nov., sp. nov., a thermophilic bacterium of the family Bifidobacteriaceae isolated from the crop of a laying hen (Gallus gallus f. domestica).

    PubMed

    Pechar, R; Killer, J; Švejstil, R; Salmonová, H; Geigerová, M; Bunešová, V; Rada, V; Benada, O

    2017-07-01

    Bacteria with potential probiotic applications are not yet sufficiently explored, even for animals with economic importance. Therefore, we decided to isolate and identify representatives of the family Bifidobacteriaceae, which inhabit the crop of laying hens. During the study, a fructose-6-phosphate phosphoketolase-positive strain, RP51T, with a regular/slightly irregular and sometimes an S-shaped slightly curved rod-like shape, was isolated from the crop of a 13 -month-old Hisex Brown hybrid laying hen. The best growth of the Gram-stain-positive bacterium, which was isolated using Bifidobacterium-selective mTPY agar, was found out to be under strictly anaerobic conditions, however an ability to grow under microaerophilic and aerobic conditions was also observed. Sequencing of the almost complete 16S rRNA gene (1444 bp) showed Alloscardovia omnicolens CCUG 31649T and Bombiscardovia coagulans BLAPIII/AGVT to be the most closely related species with similarities of 93.4 and 93.1 %, respectively. Lower sequence similarities were determined with other scardovial genera and other representatives of the genus Bifidobacterium. Taxonomic relationships with A. omnicolens and other members of the family Bifidobacteriaceaewere also demonstrated, based on the sequences of dnaK, fusA, hsp60 and rplB gene fragments. Low sequence similarities of phylogenetic markers to related scardovial genera and bifidobacteria along with unique features of the bacterial strain investigated within the family Bifidobacteriaceae(including the lowest DNA G+C value (44.3 mol%), a unique spectrum of cellular fatty acids and polar lipids, cellular morphology, the wide temperature range for growth (15-49 °C) and habitat) clearly indicate that strain RP51T is a representative of a novel genus within the family Bifidobacteriaceae for which the name Galliscardovia ingluviei gen. nov., sp. nov. (RP51T=DSM 100235T=LMG 28778T=CCM 8606T) is proposed.

  8. A Novel Laccase with Potent Antiproliferative and HIV-1 Reverse Transcriptase Inhibitory Activities from Mycelia of Mushroom Coprinus comatus

    PubMed Central

    Zhao, Shuang; Rong, Cheng-Bo; Kong, Chang; Liu, Yu; Xu, Feng; Miao, Qian-Jiang; Wang, Shou-Xian; Wang, He-Xiang

    2014-01-01

    A novel laccase was isolated and purified from fermentation mycelia of mushroom Coprinus comatus with an isolation procedure including three ion-exchange chromatography steps on DEAE-cellulose, CM-cellulose, and Q-Sepharose and one gel-filtration step by fast protein liquid chromatography on Superdex 75. The purified enzyme was a monomeric protein with a molecular weight of 64 kDa. It possessed a unique N-terminal amino acid sequence of AIGPVADLKV, which has considerably high sequence similarity with that of other fungal laccases, but is different from that of C. comatus laccases reported. The enzyme manifested an optimal pH value of 2.0 and an optimal temperature of 60°C using 2,2′-azinobis(3-ethylbenzothiazolone-6-sulfonic acid) diammonium salt (ABTS) as the substrate. The laccase displayed, at pH 2.0 and 37°C, K m values of 1.59 mM towards ABTS. It potently suppressed proliferation of tumor cell lines HepG2 and MCF7, and inhibited human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) with an IC50 value of 3.46 μM, 4.95 μM, and 5.85 μM, respectively, signifying that it is an antipathogenic protein. PMID:25540778

  9. Mesorhizobium wenxiniae sp. nov., isolated from chickpea (Cicer arietinum L.) in China.

    PubMed

    Zhang, Junjie; Guo, Chen; Chen, Wenfeng; de Lajudie, Philippe; Zhang, Zhiyan; Shang, Yimin; Wang, En Tao

    2018-06-01

    Three chickpea rhizobial strains (WYCCWR 10195 T =S1-3-7, WYCCWR 10198=S1-4-3 and WYCCWR 10200=S1-5-1) isolated from Northwest China formed a group affiliated to Mesorhizobium based on 16S rRNA gene sequence comparison. To clarify their species status, multilocus sequence analysis and average nucleotide identity (ANI) values of whole genome sequences between the novel group and the type strains of the related species were further performed. Similarities of 95.7-96.6 % in the concatenated sequences of atpD-recA-glnII and 91.9-93.1 % of ANI values to the closest-related species Mesorhizobium muleiense, Mesorhizobium mediterraneum and Mesorhizobium temperatum demonstrated the novel group a unique genospecies. The most abundant fatty acid in cells of WYCCWR 10195 T were C19 : 0 cyclo ω8c (51.4 %), followed by C18 : 1 ω7c 11-methyl (9.5 %) and C16 : 0 (9.3 %). Its genome size was 6.37 Mbp, comprising 6633 predicted genes with a DNA G+C content of 61.9 mol%. The similarities of 99.0-99.8 % for the nodC gene and 98.3-99.44 % for the nifH gene to those of the chickpea rhizobial species and nodulation with Cicer arietinum L. confirmed the strains of the new genospecies as symbiovar ciceri. The weak utilization of most of the tested sugars/organic acids and non-utilization of l(+)-rhamnose, l-cysteine and l-glycine as sole carbon source, tolerance to 1 % (w/v) NaCl, resistance to 5 µg ml -1 chloromycetin and non-hydrolysis of l-tyrosine distinguished the novel group from the related species and supported this group as a novel species, for which the name Mesorhizobium wenxiniae sp. nov. is proposed, with WYCCWR 10195 T (=S1-3-7=HAMBI 3692 T =LMG 30254 T ) as the type strain.

  10. Influence of time and length size feature selections for human activity sequences recognition.

    PubMed

    Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran

    2014-01-01

    In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.

  11. Optimization of parameter values for complex pulse sequences by simulated annealing: application to 3D MP-RAGE imaging of the brain.

    PubMed

    Epstein, F H; Mugler, J P; Brookeman, J R

    1994-02-01

    A number of pulse sequence techniques, including magnetization-prepared gradient echo (MP-GRE), segmented GRE, and hybrid RARE, employ a relatively large number of variable pulse sequence parameters and acquire the image data during a transient signal evolution. These sequences have recently been proposed and/or used for clinical applications in the brain, spine, liver, and coronary arteries. Thus, the need for a method of deriving optimal pulse sequence parameter values for this class of sequences now exists. Due to the complexity of these sequences, conventional optimization approaches, such as applying differential calculus to signal difference equations, are inadequate. We have developed a general framework for adapting the simulated annealing algorithm to pulse sequence parameter value optimization, and applied this framework to the specific case of optimizing the white matter-gray matter signal difference for a T1-weighted variable flip angle 3D MP-RAGE sequence. Using our algorithm, the values of 35 sequence parameters, including the magnetization-preparation RF pulse flip angle and delay time, 32 flip angles in the variable flip angle gradient-echo acquisition sequence, and the magnetization recovery time, were derived. Optimized 3D MP-RAGE achieved up to a 130% increase in white matter-gray matter signal difference compared with optimized 3D RF-spoiled FLASH with the same total acquisition time. The simulated annealing approach was effective at deriving optimal parameter values for a specific 3D MP-RAGE imaging objective, and may be useful for other imaging objectives and sequences in this general class.

  12. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

    PubMed

    Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

    2015-09-21

    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

  13. Lactobacillus heilongjiangensis sp. nov., isolated from Chinese pickle.

    PubMed

    Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

    2013-11-01

    A Gram-stain-positive bacterial strain, S4-3(T), was isolated from traditional pickle in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, pheS gene sequence analysis, rpoA gene sequence analysis, dnaK gene sequence analysis, fatty acid methyl ester (FAME) analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain S4-3(T) showed 97.9-98.7 % 16S rRNA gene sequence similarities, 84.4-94.1 % pheS gene sequence similarities and 94.4-96.9 % rpoA gene sequence similarities to the type strains of Lactobacillus nantensis, Lactobacillus mindensis, Lactobacillus crustorum, Lactobacillus futsaii, Lactobacillus farciminis and Lactobacillus kimchiensis. dnaK gene sequence similarities between S4-3(T) and Lactobacillus nantensis LMG 23510(T), Lactobacillus mindensis LMG 21932(T), Lactobacillus crustorum LMG 23699(T), Lactobacillus futsaii JCM 17355(T) and Lactobacillus farciminis LMG 9200(T) were 95.4, 91.5, 90.4, 91.7 and 93.1 %, respectively. Based upon the data obtained in the present study, a novel species, Lactobacillus heilongjiangensis sp. nov., is proposed and the type strain is S4-3(T) ( = LMG 26166(T) = NCIMB 14701(T)).

  14. Comparison of amyloid plaque contrast generated by T2-, T2*-, and susceptibility-weighted imaging methods in transgenic mouse models of Alzheimer’s disease

    PubMed Central

    Chamberlain, Ryan; Reyes, Denise; Curran, Geoffrey L.; Marjanska, Malgorzata; Wengenack, Thomas M.; Poduslo, Joseph F.; Garwood, Michael; Jack, Clifford R.

    2009-01-01

    One of the hallmark pathologies of Alzheimer’s disease (AD) is amyloid plaque deposition. Plaques appear hypointense on T2- and T2*-weighted MR images probably due to the presence of endogenous iron, but no quantitative comparison of various imaging techniques has been reported. We estimated the T1, T2, T2*, and proton density values of cortical plaques and normal cortical tissue and analyzed the plaque contrast generated by a collection of T2-, T2*-, and susceptibility-weighted imaging (SWI) methods in ex vivo transgenic mouse specimens. The proton density and T1 values were similar for both cortical plaques and normal cortical tissue. The T2 and T2* values were similar in cortical plaques, which indicates that the iron content of cortical plaques may not be as large as previously thought. Ex vivo plaque contrast was increased compared to a previously reported spin echo sequence by summing multiple echoes and by performing SWI; however, gradient echo and susceptibility weighted imaging was found to be impractical for in vivo imaging due to susceptibility interface-related signal loss in the cortex. PMID:19253386

  15. New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein.

    PubMed

    Gao, Hongyun; Yu, Xiaoqing; Dou, Yongchao; Wang, Jun

    2015-12-01

    Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.

  16. Short Communication: Analysis of Minor Populations of Human Immunodeficiency Virus by Primer Identification and Insertion-Deletion and Carry Forward Correction Pipelines.

    PubMed

    Hughes, Paul; Deng, Wenjie; Olson, Scott C; Coombs, Robert W; Chung, Michael H; Frenkel, Lisa M

    2016-03-01

    Accurate analysis of minor populations of drug-resistant HIV requires analysis of a sufficient number of viral templates. We assessed the effect of experimental conditions on the analysis of HIV pol 454 pyrosequences generated from plasma using (1) the "Insertion-deletion (indel) and Carry Forward Correction" (ICC) pipeline, which clusters sequence reads using a nonsubstitution approach and can correct for indels and carry forward errors, and (2) the "Primer Identification (ID)" method, which facilitates construction of a consensus sequence to correct for sequencing errors and allelic skewing. The Primer ID and ICC methods produced similar estimates of viral diversity, but differed in the number of sequence variants generated. Sequence preparation for ICC was comparably simple, but was limited by an inability to assess the number of templates analyzed and allelic skewing. The more costly Primer ID method corrected for allelic skewing and provided the number of viral templates analyzed, which revealed that amplifiable HIV templates varied across specimens and did not correlate with clinical viral load. This latter observation highlights the value of the Primer ID method, which by determining the number of templates amplified, enables more accurate assessment of minority species in the virus population, which may be relevant to prescribing effective antiretroviral therapy.

  17. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  18. Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum)

    PubMed Central

    Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin

    2015-01-01

    We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355

  19. Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

    PubMed

    Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

    2014-01-01

    Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5)), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus.

  20. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes.

    PubMed

    Razban, Rostam M; Gilson, Amy I; Durfee, Niamh; Strobelt, Hendrik; Dinkla, Kasper; Choi, Jeong-Mo; Pfister, Hanspeter; Shakhnovich, Eugene I

    2018-05-08

    Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the S. cerevisiae and E. coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S. cerevisiae and E. coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution (Dokholyan et al., 2002). Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (p-value<10-10) and -0.46 (p-value<10-10) for S. cerevisiae and E. coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant (Zhang and Yang, 2015). ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary data are available at Bioinformatics. shakhnovich@chemistry.harvard.edu.

  1. Structural flexibility and protein adaptation to temperature: Molecular dynamics analysis of malate dehydrogenases of marine molluscs.

    PubMed

    Dong, Yun-Wei; Liao, Ming-Ling; Meng, Xian-Liang; Somero, George N

    2018-02-06

    Orthologous proteins of species adapted to different temperatures exhibit differences in stability and function that are interpreted to reflect adaptive variation in structural "flexibility." However, quantifying flexibility and comparing flexibility across proteins has remained a challenge. To address this issue, we examined temperature effects on cytosolic malate dehydrogenase (cMDH) orthologs from differently thermally adapted congeners of five genera of marine molluscs whose field body temperatures span a range of ∼60 °C. We describe consistent patterns of convergent evolution in adaptation of function [temperature effects on K M of cofactor (NADH)] and structural stability (rate of heat denaturation of activity). To determine how these differences depend on flexibilities of overall structure and of regions known to be important in binding and catalysis, we performed molecular dynamics simulation (MDS) analyses. MDS analyses revealed a significant negative correlation between adaptation temperature and heat-induced increase of backbone atom movements [root mean square deviation (rmsd) of main-chain atoms]. Root mean square fluctuations (RMSFs) of movement by individual amino acid residues varied across the sequence in a qualitatively similar pattern among orthologs. Regions of sequence involved in ligand binding and catalysis-termed mobile regions 1 and 2 (MR1 and MR2), respectively-showed the largest values for RMSF. Heat-induced changes in RMSF values across the sequence and, importantly, in MR1 and MR2 were greatest in cold-adapted species. MDS methods are shown to provide powerful tools for examining adaptation of enzymes by providing a quantitative index of protein flexibility and identifying sequence regions where adaptive change in flexibility occurs.

  2. Functional Evolution of PLP-dependent Enzymes based on Active-Site Structural Similarities

    PubMed Central

    Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

    2014-01-01

    Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5’-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the Comparison of Protein Active Site Structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. PMID:24920327

  3. Functional evolution of PLP-dependent enzymes based on active-site structural similarities.

    PubMed

    Catazaro, Jonathan; Caprez, Adam; Guru, Ashu; Swanson, David; Powers, Robert

    2014-10-01

    Families of distantly related proteins typically have very low sequence identity, which hinders evolutionary analysis and functional annotation. Slowly evolving features of proteins, such as an active site, are therefore valuable for annotating putative and distantly related proteins. To date, a complete evolutionary analysis of the functional relationship of an entire enzyme family based on active-site structural similarities has not yet been undertaken. Pyridoxal-5'-phosphate (PLP) dependent enzymes are primordial enzymes that diversified in the last universal ancestor. Using the comparison of protein active site structures (CPASS) software and database, we show that the active site structures of PLP-dependent enzymes can be used to infer evolutionary relationships based on functional similarity. The enzymes successfully clustered together based on substrate specificity, function, and three-dimensional-fold. This study demonstrates the value of using active site structures for functional evolutionary analysis and the effectiveness of CPASS. © 2014 Wiley Periodicals, Inc.

  4. Association between the genetic similarity of the open reading frame 5 sequence of Porcine reproductive and respiratory syndrome virus and the similarity in clinical signs of Porcine reproductive and respiratory syndrome in Ontario swine herds.

    PubMed

    Rosendal, Thomas; Dewey, Cate; Friendship, Robert; Wootton, Sarah; Young, Beth; Poljak, Zvonimir

    2014-10-01

    A study of Ontario swine farms positive for Porcine reproductive and respiratory syndrome virus (PRRSV) tested the association between genetic similarity of the virus and similarity of clinical signs reported by the herd owner. Herds were included if a positive result of polymerase chain reaction for PRRSV at the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, was found between September 2004 and August 2007. Nucleotide-sequence similarity and clinical similarity, as determined from a telephone survey, were calculated for all pairs of herds. The Mantel test indicated that clinical similarity and sequence similarity were weakly correlated for most clinical signs. The generalized additive model indicated that virus homology with 2 vaccine viruses affected the association between sequence similarity and clinical similarity. When the data for herds with vaccine-like virus were removed from the dataset there was a significant association between virus similarity and similarity of the reported presence of abortion, stillbirth, preweaning mortality, and sow/boar mortality. Ownership similarity was also found to be associated with virus similarity and with similarity of the reported presence of sows being off-feed, nursery respiratory disease, nursery mortality, finisher respiratory disease, and finisher mortality. These results indicate that clinical signs of PRRS are associated with PRRSV genotype and that herd ownership is associated with both of these.

  5. Association between the genetic similarity of the open reading frame 5 sequence of Porcine reproductive and respiratory syndrome virus and the similarity in clinical signs of Porcine reproductive and respiratory syndrome in Ontario swine herds

    PubMed Central

    Rosendal, Thomas; Dewey, Cate; Friendship, Robert; Wootton, Sarah; Young, Beth; Poljak, Zvonimir

    2014-01-01

    A study of Ontario swine farms positive for Porcine reproductive and respiratory syndrome virus (PRRSV) tested the association between genetic similarity of the virus and similarity of clinical signs reported by the herd owner. Herds were included if a positive result of polymerase chain reaction for PRRSV at the Animal Health Laboratory at the University of Guelph, Guelph, Ontario, was found between September 2004 and August 2007. Nucleotide-sequence similarity and clinical similarity, as determined from a telephone survey, were calculated for all pairs of herds. The Mantel test indicated that clinical similarity and sequence similarity were weakly correlated for most clinical signs. The generalized additive model indicated that virus homology with 2 vaccine viruses affected the association between sequence similarity and clinical similarity. When the data for herds with vaccine-like virus were removed from the dataset there was a significant association between virus similarity and similarity of the reported presence of abortion, stillbirth, preweaning mortality, and sow/boar mortality. Ownership similarity was also found to be associated with virus similarity and with similarity of the reported presence of sows being off-feed, nursery respiratory disease, nursery mortality, finisher respiratory disease, and finisher mortality. These results indicate that clinical signs of PRRS are associated with PRRSV genotype and that herd ownership is associated with both of these. PMID:25355993

  6. Detection of Plasmodium sp. in capybara.

    PubMed

    dos Santos, Leonilda Correia; Curotto, Sandra Mara Rotter; de Moraes, Wanderlei; Cubas, Zalmir Silvino; Costa-Nascimento, Maria de Jesus; de Barros Filho, Ivan Roque; Biondo, Alexander Welker; Kirchgatter, Karin

    2009-07-07

    In the present study, we have microscopically and molecularly surveyed blood samples from 11 captive capybaras (Hydrochaeris hydrochaeris) from the Sanctuary Zoo for Plasmodium sp. infection. One animal presented positive on blood smear by light microscopy. Polymerase chain reaction was carried out accordingly using a nested genus-specific protocol, which uses oligonucleotides from conserved sequences flanking a variable sequence region in the small subunit ribosomal RNA (ssrRNA) of all Plasmodium organisms. This revealed three positive animals. Products from two samples were purified and sequenced. The results showed less than 1% divergence between the two capybara sequences. When compared with GenBank sequences, a 55% similarity was obtained to Toxoplasma gondii and a higher similarity (73-77.2%) was found to ssrRNAs from Plasmodium species that infect reptile, avian, rodents, and human beings. The most similar Plasmodium sequence was from Plasmodium mexicanum that infects lizards of North America, where around 78% identity was found. This work is the first report of Plasmodium in capybaras, and due to the low similarity with other Plasmodium species, we suggest it is a new species, which, in the future could be denominated "Plasmodium hydrochaeri".

  7. Value-based genomics.

    PubMed

    Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi

    2018-03-20

    Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.

  8. Value-based genomics

    PubMed Central

    Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi

    2018-01-01

    Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics. PMID:29644010

  9. 3D printed phantoms mimicking cortical bone for the assessment of ultrashort echo time magnetic resonance imaging.

    PubMed

    Rai, Robba; Manton, David; Jameson, Michael G; Josan, Sonal; Barton, Michael B; Holloway, Lois C; Liney, Gary P

    2018-02-01

    Human cortical bone has a rapid T2∗ decay, and it can be visualized using ultrashort echo time (UTE) techniques in magnetic resonance imaging (MRI). These sequences operate at the limits of gradient and transmit-receive signal performance. Development of multicompartment anthropomorphic phantoms that can mimic human cortical bone can assist with quality assurance and optimization of UTE sequences. The aims of this study were to (a) characterize the MRI signal properties of a photopolymer resin that can be 3D printed, (b) develop multicompartment phantoms based on the resin, and (c) demonstrate the feasibility of using these phantoms to mimic human anatomy in the assessment of UTE sequences. A photopolymer resin (Prismlab China Ltd, Shanghai, China) was imaged on a 3 Tesla MRI system (Siemens Skyra) to characterize its MRI properties with emphasis on T2∗ signal and longevity. Two anthropomorphic phantoms, using the 3D printed resin to simulate skeletal anatomy, were developed and imaged using UTE sequences. A skull phantom was developed and used to assess the feasibility of using the resin to develop a complex model with realistic morphological human characteristics. A tibia model was also developed to assess the suitability of the resin at mimicking a simple multicompartment anatomical model and imaged using a three-dimensional UTE sequence (PETRA). Image quality measurements of signal-to-noise ratio (SNR) and contrast factor were calculated and these were compared to in vivo values. The T2∗ and T 1 (mean ± standard deviation) of the photopolymer resin was found to be 411 ± 19 μs and 74.39 ± 13.88 ms, respectively, and demonstrated no statistically significant change during 4 months of monitoring. The resin had a similar T2∗ decay to human cortical bone; however, had lower T 1 properties. The bone water concentration of the resin was 59% relative to an external water reference phantom, and this was higher than in vivo values reported for human cortical bone. The multicompartment anthropomorphic head phantom was successfully produced and able to simulate realistic air cavities, bony anatomy, and soft tissue. Image quality assessment in the tibia phantom using the PETRA sequence showed the suitability of the resin to mimic human anatomy with high SNR and contrast making it suitable for tissue segmentation. A solid resin material, which can be 3D printed, has been found to have similar magnetic resonance signal properties to human cortical bone. Phantoms replicating skeletal anatomy were successfully produced using this resin and demonstrated their use for image quality and segmentation assessment of ultrashort echo time sequences. © 2017 American Association of Physicists in Medicine.

  10. Comparison of Diffusion-Weighted Imaging in the Human Brain Using Readout-Segmented EPI and PROPELLER Turbo Spin Echo With Single-Shot EPI at 7 T MRI.

    PubMed

    Kida, Ikuhiro; Ueguchi, Takashi; Matsuoka, Yuichiro; Zhou, Kun; Stemmer, Alto; Porter, David

    2016-07-01

    The purpose of the present study was to compare periodically rotated overlapping parallel lines with enhanced reconstruction-type turbo spin echo diffusion-weighted imaging (pTSE-DWI) and readout-segmented echo planar imaging (rsEPI-DWI) with single-shot echo planar imaging (ssEPI-DWI) in a 7 T human MR system. We evaluated the signal-to-noise ratio (SNR), image distortion, and apparent diffusion coefficient values in the human brain. Six healthy volunteers were included in this study. The study protocol was approved by our institutional review board. All measurements were performed at 7 T using pTSE-DWI, rsEPI-DWI, and ssEPI-DWI sequences. The spatial resolution was 1.2 × 1.2 mm in-plane with a 3-mm slice thickness. Signal-to-noise ratio was measured using 2 scans. The ssEPI-DWI sequence showed significant image blurring, whereas pTSE-DWI and rsEPI-DWI sequences demonstrated high image quality with low geometrical distortion compared with reference T2-weighted, turbo spin echo images. Signal loss in ventral regions near the air-filled paranasal sinus/nasal cavity was found in ssEPI-DWI and rsEPI-DWI but not pTSE-DWI. The apparent diffusion coefficient values for ssEPI-DWI were 824 ± 17 × 10 and 749 ± 25 × 10 mm/s in the gray matter and white matter, respectively; the values obtained for pTSE-DWI were 798 ± 21 × 10 and 865 ± 40 × 10 mm/s; and the values obtained for rsEPI-DWI were 730 ± 12 × 10 and 722 ± 25 × 10 mm/s. The pTSE-DWI images showed no additional distortion comparison to the T2-weighted images, but had a lower SNR than ssEPI-DWI and rsEPI-DWI. The rsEPI-DWI sequence provided high-quality images with minor distortion and a similar SNR to ssEPI-DWI. Our results suggest that the benefits of the rsEPI-DWI and pTSE-DWI sequences, in terms of SNR, image quality, and image distortion, appear to outweigh those of ssEPI-DWI. Thus, pTSE-DWI and rsEPI-DWI at 7 T have great potential use for clinical diagnoses. However, it is noteworthy that both sequences are limited by the scan time required. In addition, pTSE-DWI has limitations on the number of slices due to specific absorption rate. Overall, rsEPI-DWI is a favorable imaging sequence, taking into account the SNR and image quality at 7 T.

  11. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.

  12. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979

  13. Human Treponema pallidum 11q/j isolate belongs to subsp. endemicum but contains two loci with a sequence in TP0548 and TP0488 similar to subsp. pertenue and subsp. pallidum, respectively

    PubMed Central

    Mikalová, Lenka; Strouhal, Michal; Oppelt, Jan; Grange, Philippe Alain; Janier, Michel; Benhaddou, Nadjet; Dupin, Nicolas; Šmajs, David

    2017-01-01

    Background Treponema pallidum subsp. endemicum (TEN) is the causative agent of endemic syphilis (bejel). An unusual human TEN 11q/j isolate was obtained from a syphilis-like primary genital lesion from a patient that returned to France from Pakistan. Methodology/Principal findings The TEN 11q/j isolate was characterized using nested PCR followed by Sanger sequencing and/or direct Illumina sequencing. Altogether, 44 chromosomal regions were analyzed. Overall, the 11q/j isolate clustered with TEN strains Bosnia A and Iraq B as expected from previous TEN classification of the 11q/j isolate. However, the 11q/j sequence in a 505 bp-long region at the TP0488 locus was similar to Treponema pallidum subsp. pallidum (TPA) strains, but not to TEN Bosnia A and Iraq B sequences, suggesting a recombination event at this locus. Similarly, the 11q/j sequence in a 613 bp-long region at the TP0548 locus was similar to Treponema pallidum subsp. pertenue (TPE) strains, but not to TEN sequences. Conclusions/Significance A detailed analysis of two recombinant loci found in the 11q/j clinical isolate revealed that the recombination event occurred just once, in the TP0488, with the donor sequence originating from a TPA strain. Since TEN Bosnia A and Iraq B were found to contain TPA-like sequences at the TP0548 locus, the recombination at TP0548 took place in a treponeme that was an ancestor to both TEN Bosnia A and Iraq B. The sequence of 11q/j isolate in TP0548 represents an ancestral TEN sequence that is similar to yaws-causing treponemes. In addition to the importance of the 11q/j isolate for reconstruction of the TEN phylogeny, this case emphasizes the possible role of TEN strains in development of syphilis-like lesions. PMID:28263990

  14. The HMMER Web Server for Protein Sequence Similarity Search.

    PubMed

    Prakash, Ananth; Jeffryes, Matt; Bateman, Alex; Finn, Robert D

    2017-12-08

    Protein sequence similarity search is one of the most commonly used bioinformatics methods for identifying evolutionarily related proteins. In general, sequences that are evolutionarily related share some degree of similarity, and sequence-search algorithms use this principle to identify homologs. The requirement for a fast and sensitive sequence search method led to the development of the HMMER software, which in the latest version (v3.1) uses a combination of sophisticated acceleration heuristics and mathematical and computational optimizations to enable the use of profile hidden Markov models (HMMs) for sequence analysis. The HMMER Web server provides a common platform by linking the HMMER algorithms to databases, thereby enabling the search for homologs, as well as providing sequence and functional annotation by linking external databases. This unit describes three basic protocols and two alternate protocols that explain how to use the HMMER Web server using various input formats and user defined parameters. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  15. Omori's Law Applied to Mining-Induced Seismicity and Re-entry Protocol Development

    NASA Astrophysics Data System (ADS)

    Vallejos, J. A.; McKinnon, S. D.

    2010-02-01

    This paper describes a detailed study of the Modified Omori's law n( t) = K/( c + t) p applied to 163 mining-induced aftershock sequences from four different mine environments in Ontario, Canada. We demonstrate, using a rigorous statistical analysis, that this equation can be adequately used to describe the decay rate of mining-induced aftershock sequences. The parameters K, p and c are estimated using a uniform method that employs the maximum likelihood procedure and the Anderson-Darling statistic. To estimate consistent decay parameters, the method considers only the time interval that satisfies power-law behavior. The p value differs from sequence to sequence, with most (98%) ranging from 0.4 to 1.6. The parameter K can be satisfactorily expressed by: K = κN 1, where κ is an activity ratio and N 1 is the measured number of events occurring during the first hour after the principal event. The average κ values are in a well-defined range. Theoretically κ ≤ 0.8, and empirically κ ∈ [0.3-0.5]. These two findings enable us to develop a real-time event rate re-entry protocol 1 h after the principal event. Despite the fact that the Omori formula is temporally self-similar, we found a characteristic time T MC at the maximum curvature point, which is a function of Omori's law parameters. For a time sequence obeying an Omori process, T MC marks the transition from highest to lowest event rate change. Using solely the aftershock decay rate, therefore, we recommend T MC as a preliminary estimate of the time at which it may be considered appropriate to re-enter an area affected by a blast or large event. We found that T MC can be estimated without specifying a p value by the expression: T MC = a N {1/ b }, where a and b are two parameters dependent on local conditions. Both parameters presented well-constrained empirical ranges for the sites analyzed: a ∈ [0.3-0.5] and b ∈ [0.5-0.7]. These findings provide concise and well-justified guidelines for event rate re-entry protocol development.

  16. Exploiting sequence similarity to validate the sensitivity of SNP arrays in detecting fine-scaled copy number variations.

    PubMed

    Wong, Gerard; Leckie, Christopher; Gorringe, Kylie L; Haviv, Izhak; Campbell, Ian G; Kowalczyk, Adam

    2010-04-15

    High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.

  17. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  18. Marker genes that are less conserved in their sequences are useful for predicting genome-wide similarity levels between closely related prokaryotic strains

    DOE PAGES

    Lan, Yemin; Rosen, Gail; Hershberg, Ruth

    2016-05-03

    The 16s rRNA gene is so far the most widely used marker for taxonomical classification and separation of prokaryotes. Since it is universally conserved among prokaryotes, it is possible to use this gene to classify a broad range of prokaryotic organisms. At the same time, it has often been noted that the 16s rRNA gene is too conserved to separate between prokaryotes at finer taxonomic levels. In this paper, we examine how well levels of similarity of 16s rRNA and 73 additional universal or nearly universal marker genes correlate with genome-wide levels of gene sequence similarity. We demonstrate that themore » percent identity of 16s rRNA predicts genome-wide levels of similarity very well for distantly related prokaryotes, but not for closely related ones. In closely related prokaryotes, we find that there are many other marker genes for which levels of similarity are much more predictive of genome-wide levels of gene sequence similarity. Finally, we show that the identities of the markers that are most useful for predicting genome-wide levels of similarity within closely related prokaryotic lineages vary greatly between lineages. However, the most useful markers are always those that are least conserved in their sequences within each lineage. In conclusion, our results show that by choosing markers that are less conserved in their sequences within a lineage of interest, it is possible to better predict genome-wide gene sequence similarity between closely related prokaryotes than is possible using the 16s rRNA gene. We point readers towards a database we have created (POGO-DB) that can be used to easily establish which markers show lowest levels of sequence conservation within different prokaryotic lineages.« less

  19. Evolutionary Consequences of DNA Methylation in a Basal Metazoan

    PubMed Central

    Dixon, Groves B.; Bay, Line K.; Matz, Mikhail V.

    2016-01-01

    Gene body methylation (gbM) is an ancestral and widespread feature in Eukarya, yet its adaptive value and evolutionary implications remain unresolved. The occurrence of gbM within protein-coding sequences is particularly puzzling, because methylation causes cytosine hypermutability and hence is likely to produce deleterious amino acid substitutions. We investigate this enigma using an evolutionarily basal group of Metazoa, the stony corals (order Scleractinia, class Anthozoa, phylum Cnidaria). We show that patterns of coral gbM are similar to other invertebrate species, predicting wide and active transcription and slower sequence evolution. We also find a strong correlation between gbM and codon bias, resulting from systematic replacement of CpG bearing codons. We conclude that gbM has strong effects on codon evolution and speculate that this may influence establishment of optimal codons. PMID:27189563

  20. Lactobacillus futsaii sp. nov., isolated from fu-tsai and suan-tsai, traditional Taiwanese fermented mustard products.

    PubMed

    Chao, Shiou-Huei; Kudo, Yuko; Tsai, Ying-Chieh; Watanabe, Koichi

    2012-03-01

    Three Gram-stain-positive strains were isolated from fermented mustard and were rod-shaped, non-motile, asporogenous, facultatively anaerobic, homofermentative and did not exhibit catalase activity. Comparative analyses of 16S rRNA, pheS and rpoA gene sequences demonstrated that the novel strains were members of the genus Lactobacillus. On the basis of 16S rRNA gene sequence analysis, the type strains of Lactobacillus crustorum (98.7% similarity), Lactobacillus farciminis (98.9%) and Lactobacillus mindensis (97.9%) were the closest neighbours. However, DNA-DNA reassociation values with these strains were less than 50%. Phenotypic and genotypic features demonstrated that these isolates represent a novel species of the genus Lactobacillus, for which the name Lactobacillus futsaii sp. nov. is proposed; the type strain is YM 0097(T) (=JCM 17355(T)=BCRC 80278(T)).

  1. The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra.

    PubMed

    Shilov, Ignat V; Seymour, Sean L; Patel, Alpesh A; Loboda, Alex; Tang, Wilfred H; Keating, Sean P; Hunter, Christie L; Nuwaysir, Lydia M; Schaeffer, Daniel A

    2007-09-01

    The Paragon Algorithm, a novel database search engine for the identification of peptides from tandem mass spectrometry data, is presented. Sequence Temperature Values are computed using a sequence tag algorithm, allowing the degree of implication by an MS/MS spectrum of each region of a database to be determined on a continuum. Counter to conventional approaches, features such as modifications, substitutions, and cleavage events are modeled with probabilities rather than by discrete user-controlled settings to consider or not consider a feature. The use of feature probabilities in conjunction with Sequence Temperature Values allows for a very large increase in the effective search space with only a very small increase in the actual number of hypotheses that must be scored. The algorithm has a new kind of user interface that removes the user expertise requirement, presenting control settings in the language of the laboratory that are translated to optimal algorithmic settings. To validate this new algorithm, a comparison with Mascot is presented for a series of analogous searches to explore the relative impact of increasing search space probed with Mascot by relaxing the tryptic digestion conformance requirements from trypsin to semitrypsin to no enzyme and with the Paragon Algorithm using its Rapid mode and Thorough mode with and without tryptic specificity. Although they performed similarly for small search space, dramatic differences were observed in large search space. With the Paragon Algorithm, hundreds of biological and artifact modifications, all possible substitutions, and all levels of conformance to the expected digestion pattern can be searched in a single search step, yet the typical cost in search time is only 2-5 times that of conventional small search space. Despite this large increase in effective search space, there is no drastic loss of discrimination that typically accompanies the exploration of large search space.

  2. Impact of commercial precooking of common bean (Phaseolus vulgaris) on the generation of peptides, after pepsin-pancreatin hydrolysis, capable to inhibit dipeptidyl peptidase-IV.

    PubMed

    Mojica, Luis; Chen, Karen; de Mejía, Elvira González

    2015-01-01

    The objective of this research was to determine the bioactive properties of the released peptides from commercially available precook common beans (Phaseolus vulgaris). Bioactive properties and peptide profiles were evaluated in protein hydrolysates of raw and commercially precooked common beans. Five varieties (Black, Pinto, Red, Navy, and Great Northern) were selected for protein extraction, protein and peptide molecular mass profiles, and peptide sequences. Potential bioactivities of hydrolysates, including antioxidant capacity and inhibition of α-amylase, α-glucosidase, dipeptidyl peptidase-IV (DPP-IV), and angiotensin converting enzyme I (ACE) were analyzed after digestion with pepsin/pancreatin. Hydrolysates from Navy beans were the most potent inhibitors of DPP-IV with no statistical differences between precooked and raw (IC50 = 0.093 and 0.095 mg protein/mL, respectively). α-Amylase inhibition was higher for raw Red, Navy and Great Northern beans (36%, 31%, 27% relative to acarbose (rel ac)/mg protein, respectively). α-Glucosidase inhibition among all bean hydrolysates did not show significant differences; however, inhibition values were above 40% rel ac/mg protein. IC50 values for ACE were not significantly different among all bean hydrolysates (range 0.20 to 0.34 mg protein/mL), except for Red bean that presented higher IC50 values. Peptide molecular mass profile ranged from 500 to 3000 Da. A total of 11 and 17 biologically active peptide sequences were identified in raw and precooked beans, respectively. Peptide sequences YAGGS and YAAGS from raw Great Northern and precooked Pinto showed similar amino acid sequences and same potential ACE inhibition activity. Processing did not affect the bioactive properties of released peptides from precooked beans. Commercially precooked beans could contribute to the intake of bioactive peptides and promote health. © 2014 Institute of Food Technologists®

  3. Potency of Bacillus thuringiensis isolates from bareng Tenes-Malang City as a biological control agent for suppressing third instar of Aedes aegypti larvae

    NASA Astrophysics Data System (ADS)

    Lutfiana, Nihayatul; Gama, Zulfaidah Penata

    2017-11-01

    Dengue is a mosquito-borne viral disease that is transmitted by the female Aedes species. The number of dengue fever cases has increased in many geographic regions including Indonesia and one of them occurred in Bareng Tenes, Malang City, East Java Province. The objective of this research was to identify the potency of B. thuringeinsis isolates from Bareng Tenes, Malang, as the biological agent to control third instar Ae. aegypti larvae and to identify the potential B. thuringiensis isolates based on 16S rDNA sequence. B. thuringiensis was isolated from water and soil from 12 sites in the Bareng Tenes area. Bacterial isolation was performed using B. thuringiensis selective media. Several isolates had similar phenotypic characters with B. thuringiensis used to toxicity test against third instar Ae. aegypti larvae. The LC50-96h value was determined using probit regression. The most effective isolate was identified based on the 16S rDNA sequence, then aligned to the reference isolate using the BLAST program. A phylogeny tree was constructed using the Maximum Likelihood method. This study showed that among 22 isolates of B. thuringiensis, only BA02b, BS04a, and BA03a isolates have similar phenotypic characters with B. thuringiensis. Based on the toxicity test of B. thuringiensis against the third instar of Ae. aegypti larvae, it was indicated that BA02b and BA03a isolates were the potential agents to control Ae. aegypti larvae. BA02b isolate was the most effective B. thuringiensis (LC50-96h = 2,75 x 107 cell/mL). Based on 16S rDNA sequence, BA02b was identified as Bacillus thuringiensis var. Israelensis BGSC4Q2 (99 % similarities).

  4. Acinetobacter kookii sp. nov., isolated from soil.

    PubMed

    Choi, Ji Young; Ko, Gwangpyo; Jheong, Weonghwa; Huys, Geert; Seifert, Harald; Dijkshoorn, Lenie; Ko, Kwan Soo

    2013-12-01

    Two Gram-stain-negative, non-fermentative bacterial strains, designated 11-0202(T) and 11-0607, were isolated from soil in South Korea, and four others, LUH 13522, LUH 8638, LUH 10268 and LUH 10288, were isolated from a beet field in Germany, soil in the Netherlands, and sediment of integrated fish farms in Malaysia and Thailand, respectively. Based on 16S rRNA, rpoB and gyrB gene sequences, they are considered to represent a novel species of the genus Acinetobacter. Their 16S rRNA gene sequences showed greatest pairwise similarity to Acinetobacter beijerinckii NIPH 838(T) (97.9-98.4 %). They shared highest rpoB and gyrB gene sequence similarity with Acinetobacter johnsonii DSM 6963(T) and Acinetobacter bouvetii 4B02(T) (85.4-87.6 and 78.1-82.7 %, respectively). Strain 11-0202(T) displayed low DNA-DNA reassociation values (<40 %) with the most closely related species of the genus Acinetobacter. The six strains utilized azelate, 2,3-butanediol, ethanol and dl-lactate as sole carbon sources. Cellular fatty acid analyses showed similarities to profiles of related species of the genus Acinetobacter: summed feature 3 (C16 : 1ω7c, C16 : 1ω6c; 24.3-27.2 %), C18 : 1ω9c (19.9-22.1 %), C16 : 0 (15.2-22.0 %) and C12 : 0 (9.2-14.2 %). On the basis of the current findings, it is concluded that the six strains represent a novel species, for which the name Acinetobacter kookii sp. nov. is proposed. The type strain is 11-0202(T) ( = KCTC 32033(T) = JCM 18512(T)).

  5. TU-H-CAMPUS-IeP2-01: Quantitative Evaluation of PROPELLER DWI Using QIBA Diffusion Phantom

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yung, J; Ai, H; Liu, H

    Purpose: The purpose of this study is to determine the quantitative variability of apparent diffusion coefficient (ADC) values when varying imaging parameters in a diffusion-weighted (DW) fast spin echo (FSE) sequence with Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction (PROPELLER) k-space trajectory. Methods: Using a 3T MRI scanner, a NIST traceable, quantitative magnetic resonance imaging (MRI) diffusion phantom (High Precision Devices, Inc, Boulder, Colorado) consisting of 13 vials filled with various concentrations of polymer polyvinylpyrrolidone (PVP) in aqueous solution was imaged with a standard Quantitative Imaging Biomarkers Alliance (QIBA) DWI spin echo, echo planar imaging (SE EPI) acquisition. Themore » same phantom was then imaged with a DWI PROPELLER sequence at varying echo train lengths (ETL) of 8, 20, and 32, as well as b-values of 400, 900, and 2000. QIBA DWI phantom analysis software was used to generate ADC maps and create region of interests (ROIs) for quantitative measurements of each vial. Mean and standard deviations of the ROIs were compared. Results: The SE EPI sequence generated ADC values that showed very good agreement with the known ADC values of the phantom (r2 = 0.9995, slope = 1.0061). The ADC values measured from the PROPELLER sequences were inflated, but were highly correlated with an r2 range from 0.8754 to 0.9880. The PROPELLER sequence with an ETL=20 and b-value of 0 and 2000 showed the closest agreement (r2 = 0.9034, slope = 0.9880). Conclusion: The DW PROPELLER sequence is promising for quantitative evaluation of ADC values. A drawback of the PROPELLER sequence is the longer acquisition time. The 180° refocusing pulses may also cause the observed increase in ADC values compared to the standard SE EPI DW sequence. However, the FSE sequence offers an advantage with in-plane motion and geometric distortion which will be investigated in future studies.« less

  6. A Paradox within the Time Value of Money: A Critical Thinking Exercise for Finance Students

    ERIC Educational Resources Information Center

    Delaney, Charles J.; Rich, Steven P.; Rose, John T.

    2016-01-01

    This study presents a paradox within the time value of money (TVM), namely, that the interest-principal sequence embedded in the payment stream of an amortized loan is exactly the opposite of the interest-principal sequence implicit in the present value of a matching annuity. We examine this inverse sequence, both mathematically and intuitively,…

  7. On irregularities of distribution of real sequences

    PubMed Central

    Chung, F. R. K.; Graham, R. L.

    1981-01-01

    A natural measure of the amount of unavoidable clustering that must occur in any bounded infinite sequence of real numbers is studied. We determine the extreme value for this measure and exhibit sequences that achieve this value. PMID:16593046

  8. Amino terminal sequence of heavy and light chains from ratfish immunoglobulin.

    PubMed

    De Ioannes, A E; Aguila, H L

    1989-01-01

    The ratfish, Callorhinchus callorhinchus, a representative of the Holocephali, has a natural serum hemagglutinin (Mr 960,000), composed of heavy (Mr 71,000), light (Mr 22,500), and J (Mr 16,000) chains. To approach the mechanisms that generate diversity at this level of evolution, the amino terminal sequence of the heavy and light chains was determined by automated microsequencing. The chains are unblocked and have modest internal sequence heterogeneity. The heavy chains show sequence similarity with the terminal region of the heavy chain from the horned shark, Heterodontus francisci, and other species. In contrast to the heavy chain, the ratfish light chains display low sequence similarity with their shark kappa counterparts. However, their similarity with the variable region of the chicken lambda light chains is about 75%.

  9. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  10. Genetic variations in merozoite surface antigen genes of Babesia bovis detected in Vietnamese cattle and water buffaloes.

    PubMed

    Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Tuvshintulga, Bumduuren; Hayashida, Kyoko; Igarashi, Ikuo; Inoue, Noboru; Long, Phung Thang; Lan, Dinh Thi Bich

    2015-03-01

    The genes that encode merozoite surface antigens (MSAs) in Babesia bovis are genetically diverse. In this study, we analyzed the genetic diversity of B. bovis MSA-1, MSA-2b, and MSA-2c genes in Vietnamese cattle and water buffaloes. Blood DNA samples from 258 cattle and 49 water buffaloes reared in the Thua Thien Hue province of Vietnam were screened with a B. bovis-specific diagnostic PCR assay. The B. bovis-positive DNA samples (23 cattle and 16 water buffaloes) were then subjected to PCR assays to amplify the MSA-1, MSA-2b, and MSA-2c genes. Sequencing analyses showed that the Vietnamese MSA-1 and MSA-2b sequences are genetically diverse, whereas MSA-2c is relatively conserved. The nucleotide identity values for these MSA gene sequences were similar in the cattle and water buffaloes. Consistent with the sequencing data, the Vietnamese MSA-1 and MSA-2b sequences were dispersed across several clades in the corresponding phylogenetic trees, whereas the MSA-2c sequences occurred in a single clade. Cattle- and water-buffalo-derived sequences also often clustered together on the phylogenetic trees. The Vietnamese MSA-1, MSA-2b, and MSA-2c sequences were then screened for recombination with automated methods. Of the seven recombination events detected, five and two were associated with the MSA-2b and MSA-2c recombinant sequences, respectively, whereas no MSA-1 recombinants were detected among the sequences analyzed. Recombination between the sequences derived from cattle and water buffaloes was very common, and the resultant recombinant sequences were found in both host animals. These data indicate that the genetic diversity of the MSA sequences does not differ between cattle and water buffaloes in Vietnam. They also suggest that recombination between the B. bovis MSA sequences in both cattle and water buffaloes might contribute to the genetic variation in these genes in Vietnam. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Thioclava electrotropha sp. nov., a versatile electrode and sulfur-oxidizing bacterium from marine sediments.

    PubMed

    Chang, Rachel; Bird, Lina; Barr, Casey; Osburn, Magdalena; Wilbanks, Elizabeth; Nealson, Kenneth; Rowe, Annette

    2018-05-01

    A taxonomic and physiologic characterization was carried out on Thioclava strain ElOx9 T , which was isolated from a bacterial consortium enriched on electrodes poised at electron donating potentials. The isolate is Gram-negative, catalase-positive and oxidase-positive; the cells are motile short rods. The bacterium is facultatively anaerobic with the ability to utilize nitrate as an electron acceptor. Autotrophic growth with H2 and S 0 (oxidized to sulfate) was observed. The isolate also grows heterotrophically with organic acids and sugars. Growth was observed at salinities from 0 to 10% NaCl and at temperatures from 15 to 41 °C. Phylogenetic analysis based on 16S rRNA gene sequences indicated that the strain belongs in the genus Thioclava; it had the highest sequence similarity of 98.8 % to Thioclava atlantica 13D2W-2 T , followed by Thioclava dalianensis DLFJ1-1 T with 98.5 % similarity, Thioclava pacifica TL 2 T with 97.7 % similarity, and then Thioclava indica DT23-4 T with 96.9 %. All other sequence similarities were below 97 % to characterized strains. The digital DNA-DNA hybridization estimated when compared to T. atlantica 13D2W-2 T , T. dalianensis DLFJ1-1 T , T. pacifica TL 2 T and T. indica DT23-4 T were 15.8±2.1, 16.7+2.1, 14.3±1.9 and 18.3±2.1 %. The corresponding average nucleotide identity values between these strains were determined to be 65.1, 67.8, 68.4 and 64.4 %, respectively. The G+C content of the chromosomal DNA is 63.4 mol%. Based on these results, a novel species Thioclava electrotropha sp. nov. is proposed, with the type strain ElOx9 T (=DSM 103712 T =ATCC TSD-100 T ).

  12. Actinoplanes rhizophilus sp. nov., an actinomycete isolated from the rhizosphere of Sansevieria trifasciata Prain.

    PubMed

    He, Hairong; Xing, Jia; Liu, Chongxi; Li, Chuang; Ma, Zhaoxu; Li, Jiansong; Xiang, Wensheng; Wang, Xiangjing

    2015-12-01

    A novel actinomycete, designated strain NEAU-A-2T, was isolated from the rhizosphere soil of Sansevieria trifasciata Prain collected from Heilongjiang province, north-east China. The taxonomic status of this organism was established using a polyphasic approach. The isolate formed irregular sporangia containing motile spores on the substrate mycelium. The whole-cell sugars were xylose and galactose. The predominant menaquinones were MK-9(H10), MK-9(H2), MK-10(H2) and MK-10(H4). The major fatty acids were iso-C15 : 0, iso-C16 : 0 and anteiso-C15 : 0. The polar lipids were diphosphatidylglycerol, phosphatidylmonomethylethanolamine, phosphatidylethanolamine, phosphatidylinositol, three unidentified phospholipids and an unidentified glycolipid. 16S rRNA gene sequence similarity studies showed that strain NEAU-A-2T belongs to the genus Actinoplanes with the highest sequence similarities to Actinoplanes globisporus NBRC 13912T (97.7 % 16S rRNA gene sequence similarity), Actinoplanes ferrugineus IMSNU 22125T (97.5 %), Actinoplanes toevensis MN07-A0368T (97.2 %) and Actinoplanes rishiriensis NBRC 108556T (97.2 %); similarities to type strains of other species of this genus were < 97 %. Two tree-making algorithms showed that strain NEAU-A-2T formed a distinct clade with A. globisporus NBRC 13912T and A. rishiriensis NBRC 108556T. However, low DNA-DNA relatedness values allowed the isolate to be differentiated from the above-mentioned two species of the genus Actinoplanes. Moreover, strain NEAU-A-2T could also be distinguished from the most closely related species by morphological and physiological characteristics. Therefore, in conclusion, isolate NEAU-A-2T represents a novel species of the genus Actinoplanes, for which the name Actinoplanes rhizophilus sp. nov. is proposed. The type strain is NEAU-A-2T ( = CGMCC 4.7133T = DSM 46672T).

  13. SW#db: GPU-Accelerated Exact Sequence Similarity Database Search.

    PubMed

    Korpar, Matija; Šošić, Martin; Blažeka, Dino; Šikić, Mile

    2015-01-01

    In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.

  14. Four distinct types of E.C. 1.2.1.30 enzymes can catalyze the reduction of carboxylic acids to aldehydes.

    PubMed

    Stolterfoht, Holly; Schwendenwein, Daniel; Sensen, Christoph W; Rudroff, Florian; Winkler, Margit

    2017-09-10

    Increasing demand for chemicals from renewable resources calls for the development of new biotechnological methods for the reduction of oxidized bio-based compounds. Enzymatic carboxylate reduction is highly selective, both in terms of chemo- and product selectivity, but not many carboxylate reductase enzymes (CARs) have been identified on the sequence level to date. Thus far, their phylogeny is unexplored and very little is known about their structure-function-relationship. CARs minimally contain an adenylation domain, a phosphopantetheinylation domain and a reductase domain. We have recently identified new enzymes of fungal origin, using similarity searches against genomic sequences from organisms in which aldehydes were detected upon incubation with carboxylic acids. Analysis of sequences with known CAR functionality and CAR enzymes recently identified in our laboratory suggests that the three-domain architecture mentioned above is modular. The construction of a distance tree with a subsequent 1000-replicate bootstrap analysis showed that the CAR sequences included in our study fall into four distinct subgroups (one of bacterial origin and three of fungal origin, respectively), each with a bootstrap value of 100%. The multiple sequence alignment of all experimentally confirmed CAR protein sequences revealed fingerprint sequences of residues which are likely to be involved in substrate and co-substrate binding and one of the three catalytic substeps, respectively. The fingerprint sequences broaden our understanding of the amino acids that might be essential for the reduction of organic acids to the corresponding aldehydes in CAR proteins. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Sequence of mammalian fossils, including hominoid teeth, from the Bubing Basin caves, South China.

    PubMed

    Wang, Wei; Potts, Richard; Baoyin, Yuan; Huang, Weiwen; Cheng, Hai; Edwards, R Lawrence; Ditchfield, Peter

    2007-04-01

    A Plio-Pleistocene to Holocene faunal sequence has been recovered from four carefully excavated caves in the Bubing Basin, adjacent to the larger Bose Basin of South China. The caves vary in elevation; we suggest that the higher caves were formed and filled with sediments prior to the lower caves. The highest deposits, which are from Mohui Cave, contain hominoid teeth and other fossilized remains of mammalian taxa most similar to late Pliocene and early Pleistocene faunas. Wuyun Cave ( approximately 50m lower in elevation than Mohui) contains a late middle Pleistocene fauna, which is supported by U-series age constraints from 350 to 200ka. Lower Pubu Cave ( approximately 23m below Wuyun) is assigned to the late Pleistocene, while the Cunkong Cave (the lowest, approximately 2m lower elevation than Lower Pubu) preserves a Holocene fauna. The four faunal assemblages indicate species-level changes in Ailuropoda, Stegodon, and Sus, the appearance of Elephas, the local disappearance of Stegodon, and the migration of Equus hemionus to South China. These initial results of our work call into question the continued value of the Stegodon/Ailuropoda Fauna, a category long used to characterize the Pleistocene faunas of South China. Excavation of karstic caves of varying elevation within the basins of South China holds promise for defining local sequences of mammalian fossils that can be used to investigate faunal variations related to climate change, biogeographic events, and evolutionary change over the past two million years. Stable isotopic analysis of a small sample of mammalian teeth from Bubing Basin caves is consistent with 100% C(3) vegetation in the Bubing/Bose region, with certain delta(13)C values consistent with a canopied woodland or forest. A preliminary assessment of the hominoid teeth indicates the presence of diverse molar and premolar morphologies including dental remains of Gigantopithecus blacki and a sample with similarities to the teeth reported from Longgupo.

  16. Shewanella gelidii sp. nov., isolated from the red algae Gelidium amansii, and emended description of Shewanella waksmanii.

    PubMed

    Wang, Yan; Chen, Hongli; Liu, Zhenhua; Ming, Hong; Zhou, Chenyan; Zhu, Xinshu; Zhang, Peng; Jing, Changqin; Feng, Huigen

    2016-08-01

    A novel Gram-stain-negative, straight or slightly curved rod-shaped, non-spore-forming, facultatively anaerobic bacterium with a single polar flagellum, designated RZB5-4T, was isolated from a sample of the red algae Gelidium amansii collected from the coastal region of Rizhao, PR China (119.625° E 35.517° N). The organism grew optimally between 24 and 28 °C, at pH 7.0 and in the presence of 2-3 % (w/v) NaCl. The strain required seawater or artificial seawater for growth, and NaCl alone did not support growth. Strain RZB5-4T contained C16 : 1ω7c and/or C16 : 1ω6c, C16 : 0 and iso-C15 : 0 as the dominant fatty acids. The respiratory quinones detected in strain RZB5-4T were ubiquinone 7, ubiquinone 8, menaquinone 7 and methylmenaquinone 7. The polar lipids of strain RZB5-4T comprised phosphatidylethanolamine, phosphatidylglycerol, phosphatidylmonomethylethanolamine, one unidentified glycolipid, one unidentified phospholipid and one unknown lipid. The DNA G+C content of strain RZB5-4T was 47 mol %. Phylogenetic analysis based on 16S rRNA and gyrase B (gyrB) gene sequences showed that strain RZB5-4T belonged to the genus Shewanella, clustering with Shewanella waksmanii ATCC BAA-643T. Strain RZB5-4T exhibited the highest 16S rRNA gene sequence similarity value (96.6 %) and the highest gyrB gene sequence similarity value (80.7 %), respectively, to S. waksmanii ATCC BAA-643T. On the basis of polyphasic analyses, strain RZB5-4T represents a novel species of the genus Shewanella, for which the name Shewanella gelidii sp. nov. is proposed. The type strain is RZB5-4T (=JCM 30804T=KCTC 42663T=MCCC 1K00697T).

  17. Stakelama algicida sp. nov., novel algicidal species of the family Sphingomonadaceae isolated from seawater.

    PubMed

    Kristyanto, Sylvia; Chaudhary, Dhiraj Kumar; Kim, Jaisoo

    2018-01-01

    We conducted a taxonomic study of two algicidal bacteria, designated strains Yeonmyeong 1-13 T and Yeonmyeong 1-11, isolated from seawater off Geoje Island in the South Sea, Republic of Korea. The two novel strains were yellow-pigmented, halotolerant, Gram-stain-negative, strictly aerobic, non-spore-forming, rod-shaped bacteria. Both strains were able to grow at 5-39 °C, pH 5.0-10.0 and 0-11 % (w/v) NaCl concentration. Based on the 16S rRNA gene sequence analysis, strains Yeonmyeong 1-13 T and Yeonmyeong 1-11 belonged to the genus Stakelama and are closely related to Stakelama pacifica JLT832 T (98.37% and 98.22 % sequence similarity, respectively). The pairwise sequence similarity between strains Yeonmyeong 1-13 T and Yeonmyeong 1-11 was observed to be 99.50 %. In both strains, the only respiratory quinone was ubiquinone-10; the major polar lipids were phosphatidylethanolamine, phosphatidylglycerol, diphosphatidylglycerol, phosphatidylcholine and sphingoglycolipid; the major fatty acids were C18 : 1ω7c, C16 : 0 and C14 : 0 2-OH. DNA G+C content values of strains Yeonmyeong 1-13 T and Yeonmyeong 1-11 were 65.1% and 64.9 mol%, respectively. The DNA-DNA relatedness between Yeonmyeong 1-13 T and S. pacifica DSM 25059 T was 28.7 %, which falls below the threshold value of 70 % for the strain to be considered as novel. The morphological, physiological, chemotaxonomic and phylogenetic analyses clearly distinguished strain Yeonmyeong 1-13 T from its closest phylogenetic neighbours. Thus, strains Yeonmyeong 1-13 T and Yeonmyeong 1-11 represent a novel species of the genus Stakelama, for which the name Stakelama algicida sp. nov. is proposed. The type strain is Yeonmyeong 1-13 T (=KEMB 9005-324 T =JCM 31498 T ).

  18. Photobacterium damselae ssp. piscicida: detection by direct amplification of 16S rRNA gene sequences and genotypic variation as determined by amplified fragment length polymorphism (AFLP).

    PubMed

    Kvitt, H; Ucko, M; Colorni, A; Batargias, C; Zlotkin, A; Knibb, W

    2002-04-05

    A PCR protocol for the rapid diagnosis of fish 'pasteurellosis' based on 16S rRNA gene sequences was developed. The procedure combines low annealing temperature that detects low titers of Photobacterium damselae but also related species, and high annealing temperature for the specific identification of P. damselae directly from infected fish. The PCR protocol was validated on 19 piscine isolates of P. damselae ssp. piscicida from different geographic regions (Japan, Italy, Spain, Greece and Israel), on spontaneously infected sea bream Sparus aurata and sea bass Dicentrarchus labrax, and on closely related American Type Culture Collection (ATCC) reference strains. PCR using high annealing temperature (64 degrees C) discriminated between P. damselae and closely related reference strains, including P. histaminum. Sixteen isolates of P. damselae ssp. piscicida, 2 P. damselae ssp. piscicida reference strains and 1 P. damselae ssp. damselae reference strain were subjected to Amplified Fragment Length Polymorphism (AFLP) analysis, and a similarity matrix was produced. Accordingly, the Japanese isolates of P. damselae ssp. piscicida were distinguished from the Mediterranean/European isolates at a cut-off value of 83% similarity. A further subclustering at a cut-off value of 97% allowed discrimination between the Israeli P. damselae ssp. piscicida isolates and the other Mediterranean/European isolates. The combination of PCR direct amplification and AFLP provides a 2-step procedure, where P. damselae is rapidly identified at genus level on the basis of its 16S rRNA gene sequence and then grouped into distinct clusters on the basis of AFLP polymorphisms. The first step of direct amplification is highly sensitive and has immediate practical consequences, offering fish farmers a rapid diagnosis, while the AFLP is more specific and detects intraspecific variation which, in our study, also reflected geographic correspondence. Because of its superior discriminative properties, AFLP can be an important tool for epidemiological and taxonomic studies of this highly homogeneous genus.

  19. Previously unknown and highly divergent ssDNA viruses populate the oceans.

    PubMed

    Labonté, Jessica M; Suttle, Curtis A

    2013-11-01

    Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.

  20. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  2. Diversity of virus-host systems in hypersaline Lake Retba, Senegal.

    PubMed

    Sime-Ngando, Télesphore; Lucas, Soizick; Robin, Agnès; Tucker, Kimberly Pause; Colombet, Jonathan; Bettarel, Yvan; Desmond, Elie; Gribaldo, Simonetta; Forterre, Patrick; Breitbart, Mya; Prangishvili, David

    2011-08-01

    Remarkable morphological diversity of virus-like particles was observed by transmission electron microscopy in a hypersaline water sample from Lake Retba, Senegal. The majority of particles morphologically resembled hyperthermophilic archaeal DNA viruses isolated from extreme geothermal environments. Some hypersaline viral morphotypes have not been previously observed in nature, and less than 1% of observed particles had a head-and-tail morphology, which is typical for bacterial DNA viruses. Culture-independent analysis of the microbial diversity in the sample suggested the dominance of extremely halophilic archaea. Few of the 16S sequences corresponded to known archeal genera (Haloquadratum, Halorubrum and Natronomonas), whereas the majority represented novel archaeal clades. Three sequences corresponded to a new basal lineage of the haloarchaea. Bacteria belonged to four major phyla, consistent with the known diversity in saline environments. Metagenomic sequencing of DNA from the purified virus-like particles revealed very few similarities to the NCBI non-redundant database at either the nucleotide or amino acid level. Some of the identifiable virus sequences were most similar to previously described haloarchaeal viruses, but no sequence similarities were found to archaeal viruses from extreme geothermal environments. A large proportion of the sequences had similarity to previously sequenced viral metagenomes from solar salterns. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

  3. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    PubMed

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  4. Three-dimensional T1rho-weighted MRI at 1.5 Tesla.

    PubMed

    Borthakur, Arijitt; Wheaton, Andrew; Charagundla, Sridhar R; Shapiro, Erik M; Regatte, Ravinder R; Akella, Sarma V S; Kneeland, J Bruce; Reddy, Ravinder

    2003-06-01

    To design and implement a magnetic resonance imaging (MRI) pulse sequence capable of performing three-dimensional T(1rho)-weighted MRI on a 1.5-T clinical scanner, and determine the optimal sequence parameters, both theoretically and experimentally, so that the energy deposition by the radiofrequency pulses in the sequence, measured as the specific absorption rate (SAR), does not exceed safety guidelines for imaging human subjects. A three-pulse cluster was pre-encoded to a three-dimensional gradient-echo imaging sequence to create a three-dimensional, T(1rho)-weighted MRI pulse sequence. Imaging experiments were performed on a GE clinical scanner with a custom-built knee-coil. We validated the performance of this sequence by imaging articular cartilage of a bovine patella and comparing T(1rho) values measured by this sequence to those obtained with a previously tested two-dimensional imaging sequence. Using a previously developed model for SAR calculation, the imaging parameters were adjusted such that the energy deposition by the radiofrequency pulses in the sequence did not exceed safety guidelines for imaging human subjects. The actual temperature increase due to the sequence was measured in a phantom by a MRI-based temperature mapping technique. Following these experiments, the performance of this sequence was demonstrated in vivo by obtaining T(1rho)-weighted images of the knee joint of a healthy individual. Calculated T(1rho) of articular cartilage in the specimen was similar for both and three-dimensional and two-dimensional methods (84 +/- 2 msec and 80 +/- 3 msec, respectively). The temperature increase in the phantom resulting from the sequence was 0.015 degrees C, which is well below the established safety guidelines. Images of the human knee joint in vivo demonstrate a clear delineation of cartilage from surrounding tissues. We developed and implemented a three-dimensional T(1rho)-weighted pulse sequence on a 1.5-T clinical scanner. Copyright 2003 Wiley-Liss, Inc.

  5. A survey and evaluations of histogram-based statistics in alignment-free sequence comparison.

    PubMed

    Luczak, Brian B; James, Benjamin T; Girgis, Hani Z

    2017-12-06

    Since the dawn of the bioinformatics field, sequence alignment scores have been the main method for comparing sequences. However, alignment algorithms are quadratic, requiring long execution time. As alternatives, scientists have developed tens of alignment-free statistics for measuring the similarity between two sequences. We surveyed tens of alignment-free k-mer statistics. Additionally, we evaluated 33 statistics and multiplicative combinations between the statistics and/or their squares. These statistics are calculated on two k-mer histograms representing two sequences. Our evaluations using global alignment scores revealed that the majority of the statistics are sensitive and capable of finding similar sequences to a query sequence. Therefore, any of these statistics can filter out dissimilar sequences quickly. Further, we observed that multiplicative combinations of the statistics are highly correlated with the identity score. Furthermore, combinations involving sequence length difference or Earth Mover's distance, which takes the length difference into account, are always among the highest correlated paired statistics with identity scores. Similarly, paired statistics including length difference or Earth Mover's distance are among the best performers in finding the K-closest sequences. Interestingly, similar performance can be obtained using histograms of shorter words, resulting in reducing the memory requirement and increasing the speed remarkably. Moreover, we found that simple single statistics are sufficient for processing next-generation sequencing reads and for applications relying on local alignment. Finally, we measured the time requirement of each statistic. The survey and the evaluations will help scientists with identifying efficient alternatives to the costly alignment algorithm, saving thousands of computational hours. The source code of the benchmarking tool is available as Supplementary Materials. © The Author 2017. Published by Oxford University Press.

  6. A theoretical method to compute sequence dependent configurational properties in charged polymers and proteins.

    PubMed

    Sawle, Lucas; Ghosh, Kingshuk

    2015-08-28

    A general formalism to compute configurational properties of proteins and other heteropolymers with an arbitrary sequence of charges and non-uniform excluded volume interaction is presented. A variational approach is utilized to predict average distance between any two monomers in the chain. The presented analytical model, for the first time, explicitly incorporates the role of sequence charge distribution to determine relative sizes between two sequences that vary not only in total charge composition but also in charge decoration (even when charge composition is fixed). Furthermore, the formalism is general enough to allow variation in excluded volume interactions between two monomers. Model predictions are benchmarked against the all-atom Monte Carlo studies of Das and Pappu [Proc. Natl. Acad. Sci. U. S. A. 110, 13392 (2013)] for 30 different synthetic sequences of polyampholytes. These sequences possess an equal number of glutamic acid (E) and lysine (K) residues but differ in the patterning within the sequence. Without any fit parameter, the model captures the strong sequence dependence of the simulated values of the radius of gyration with a correlation coefficient of R(2) = 0.9. The model is then applied to real proteins to compare the unfolded state dimensions of 540 orthologous pairs of thermophilic and mesophilic proteins. The excluded volume parameters are assumed similar under denatured conditions, and only electrostatic effects encoded in the sequence are accounted for. With these assumptions, thermophilic proteins are found-with high statistical significance-to have more compact disordered ensemble compared to their mesophilic counterparts. The method presented here, due to its analytical nature, is capable of making such high throughput analysis of multiple proteins and will have broad applications in proteomic studies as well as in other heteropolymeric systems.

  7. Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

    PubMed Central

    Li, Weizhong; Lopez, Rodrigo

    2017-01-01

    Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999

  8. Towards a comprehensive barcode library for arctic life - Ephemeroptera, Plecoptera, and Trichoptera of Churchill, Manitoba, Canada

    PubMed Central

    2009-01-01

    Background This study reports progress in assembling a DNA barcode reference library for Ephemeroptera, Plecoptera, and Trichoptera ("EPTs") from a Canadian subarctic site, which is the focus of a comprehensive biodiversity inventory using DNA barcoding. These three groups of aquatic insects exhibit a moderate level of species diversity, making them ideal for testing the feasibility of DNA barcoding for routine biotic surveys. We explore the correlation between the morphological species delineations, DNA barcode-based haplotype clusters delimited by a sequence threshold (2%), and a threshold-free approach to biodiversity quantification--phylogenetic diversity. Results A DNA barcode reference library is built for 112 EPT species for the focal region, consisting of 2277 COI sequences. Close correspondence was found between EPT morphospecies and haplotype clusters as designated using a standard threshold value. Similarly, the shapes of taxon accumulation curves based upon haplotype clusters were very similar to those generated using phylogenetic diversity accumulation curves, but were much more computationally efficient. Conclusion The results of this study will facilitate other lines of research on northern EPTs and also bode well for rapidly conducting initial biodiversity assessments in unknown EPT faunas. PMID:20003245

  9. mrtailor: a tool for PDB-file preparation for the generation of external restraints.

    PubMed

    Gruene, Tim

    2013-09-01

    Model building starting from, for example, a molecular-replacement solution with low sequence similarity introduces model bias, which can be difficult to detect, especially at low resolution. The program mrtailor removes low-similarity regions from a template PDB file according to sequence similarity between the target sequence and the template sequence and maps the target sequence onto the PDB file. The modified PDB file can be used to generate external restraints for low-resolution refinement with reduced model bias and can be used as a starting point for model building and refinement. The program can call ProSMART [Nicholls et al. (2012), Acta Cryst. D68, 404-417] directly in order to create external restraints suitable for REFMAC5 [Murshudov et al. (2011), Acta Cryst. D67, 355-367]. Both a command-line version and a GUI exist.

  10. The nucleotide sequence of 5S rRNA from a cellular slime mold Dictyostelium discoideum.

    PubMed Central

    Hori, H; Osawa, S; Iwabuchi, M

    1980-01-01

    The nucleotide sequence of ribosomal 5S rRNA from a cellular slime mold Dictyostelium discoideum is GUAUACGGCCAUACUAGGUUGGAAACACAUCAUCCCGUUCGAUCUGAUA AGUAAAUCGACCUCAGGCCUUCCAAGUACUCUGGUUGGAGACAACAGGGGAACAUAGGGUGCUGUAUACU. A model for the secondary structure of this 5S rRNA is proposed. The sequence is more similar to those of animals (62% similarity on the average) rather than those of yeasts (56%). Images PMID:7465421

  11. Diversity of Micromonospora strains isolated from nitrogen fixing nodules and rhizosphere of Pisum sativum analyzed by multilocus sequence analysis.

    PubMed

    Carro, Lorena; Spröer, Cathrin; Alonso, Pilar; Trujillo, Martha E

    2012-03-01

    It was recently reported that Micromonospora inhabits the intracellular tissues of nitrogen fixing nodules of the wild legume Lupinus angustifolius. To determine if Micromonospora populations are also present in nitrogen fixing nodules of cultivated legumes such as Pisum sativum, we carried out the isolation of this actinobacterium from P. sativum plants collected in two man-managed fields in the region of Castilla and León (Spain). In this work, we describe the isolation of 93 Micromonospora strains recovered from nitrogen fixing nodules and the rhizosphere of P. sativum. The genomic diversity of the strains was analyzed by amplified ribosomal DNA restriction analysis (ARDRA). Forty-six isolates and 34 reference strains were further analyzed using a multilocus sequence analysis scheme developed to address the phylogeny of the genus Micromonospora and to evaluate the species distribution in the two studied habitats. The MLSA results were evaluated by DNA-DNA hybridization to determine their usefulness for the delineation of Micromonospora at the species level. In most cases, DDH values below 70% were obtained with strains that shared a sequence similarity of 98.5% or less. Thus, MLSA studies clearly supported the established taxonomy of the genus Micromonospora and indicated that genomic species could be delineated as groups of strains that share > 98.5% sequence similarity based on the 5 genes selected. The species diversity of the strains isolated from both the rhizosphere and nodules was very high and in many cases the new strains could not be related to any of the currently described species. Copyright © 2011 Elsevier GmbH. All rights reserved.

  12. Lactobacillus shenzhenensis sp. nov., isolated from a fermented dairy beverage.

    PubMed

    Zou, Yuanqiang; Liu, Feng; Fang, Chengxiang; Wan, Daiwei; Yang, Rentao; Su, Qingqing; Yang, Ruifu; Zhao, Jiao

    2013-05-01

    Two Lactobacillus strains, designated LY-73(T) and LY-30B, were isolated from a dairy beverage, sold in Shenzhen market, China. The two isolates were Gram-positive, non-spore-forming, non-motile, facultatively anaerobic rods that were heterofermentative and did not exhibit catalase activity. Sequencing of the 16S rRNA, pheS and rpoA genes revealed that the two isolates shared 99.5, 99.8 and 99.9 % sequence similarity, which indicates that they belong to the same species. Phylogenetic analysis demonstrated clustering of the two isolates with the genus Lactobacillus. Strain LY-73(T) showed highest 16S rRNA gene sequence similarities with Lactobacillus harbinensis KACC 12409(T) (97.73%), Lactobacillus perolens DSM 12744(T) (96.96 %) and Lactobacillus selangorensis DSM 13344(T) (93.10 %). Comparative analyses of their rpoA and pheS gene sequences indicated that the novel strains were significantly different from other Lactobacillus species. Low DNA-DNA reassociation values (50.5 %) were obtained between strain LY-73(T) and its phylogenetically closest neighbours. The G+C contents of the DNA of the two novel isolates were 56.1 and 56.5 mol%. Straight-chain unsaturated fatty acids C18 : 1ω9c (78.85 and 74.29 %) were the dominant components, and the cell-wall peptidoglycan was of the l-Lys-d-Asp type. Based on phenotypic characteristics, and chemotaxonomic and genotypic data, the novel strains represent a novel species of the genus Lactobacillus, for which the name Lactobacillus shenzhenensis sp. nov. is proposed, with LY-73(T) ( = CCTCC M 2011481(T) = KACC 16878(T)) as the type strain.

  13. Characterization of New Isolates of Apricot vein clearing-associated virus and of a New Prunus-Infecting Virus: Evidence for Recombination as a Driving Force in Betaflexiviridae Evolution.

    PubMed

    Marais, Armelle; Faure, Chantal; Mustafayev, Eldar; Candresse, Thierry

    2015-01-01

    Double stranded RNAs from Prunus samples gathered from various surveys were analyzed by a deep-sequencing approach. Contig annotations revealed the presence of a potential new viral species in an Azerbaijani almond tree (Prunus amygdalus) and its genome sequence was completed. Its genomic organization is similar to that of the recently described Apricot vein clearing associated virus (AVCaV) for which two new isolates were also characterized, in a similar fashion, from two Japanese plums (Prunus salicina) from a French germplasm collection. The amino acid identity values between the four proteins encoded by the genome of the new virus have identity levels with those of AVCaV which fall clearly outside the species demarcation criteria. The new virus should therefore be considered as a new species for which the name of Caucasus prunus virus (CPrV) has been proposed. Phylogenetic relationships and nucleotide comparisons suggested that together with AVCaV, CPrV could define a new genus (proposed name: Prunevirus) in the family Betaflexiviridae. A molecular test targeting both members of the new genus was developed, allowing the detection of additional AVCaV isolates, and therefore extending the known geographical distribution and the host range of AVCaV. Moreover, the phylogenetic trees reconstructed with the amino acid sequences of replicase, movement and coat proteins of representative Betaflexiviridae members suggest that Citrus leaf blotch virus (CLBV, type member of the genus Citrivirus) may have evolved from a recombination event involving a Prunevirus, further highlighting the importance of recombination as a driving force in Betaflexiviridae evolution. The sequences reported in the present manuscript have been deposited in the GenBank database under accession numbers KM507061-KM504070.

  14. Characterization of New Isolates of Apricot vein clearing-associated virus and of a New Prunus-Infecting Virus: Evidence for Recombination as a Driving Force in Betaflexiviridae Evolution

    PubMed Central

    Marais, Armelle; Faure, Chantal; Mustafayev, Eldar; Candresse, Thierry

    2015-01-01

    Double stranded RNAs from Prunus samples gathered from various surveys were analyzed by a deep-sequencing approach. Contig annotations revealed the presence of a potential new viral species in an Azerbaijani almond tree (Prunus amygdalus) and its genome sequence was completed. Its genomic organization is similar to that of the recently described Apricot vein clearing associated virus (AVCaV) for which two new isolates were also characterized, in a similar fashion, from two Japanese plums (Prunus salicina) from a French germplasm collection. The amino acid identity values between the four proteins encoded by the genome of the new virus have identity levels with those of AVCaV which fall clearly outside the species demarcation criteria. The new virus should therefore be considered as a new species for which the name of Caucasus prunus virus (CPrV) has been proposed. Phylogenetic relationships and nucleotide comparisons suggested that together with AVCaV, CPrV could define a new genus (proposed name: Prunevirus) in the family Betaflexiviridae. A molecular test targeting both members of the new genus was developed, allowing the detection of additional AVCaV isolates, and therefore extending the known geographical distribution and the host range of AVCaV. Moreover, the phylogenetic trees reconstructed with the amino acid sequences of replicase, movement and coat proteins of representative Betaflexiviridae members suggest that Citrus leaf blotch virus (CLBV, type member of the genus Citrivirus) may have evolved from a recombination event involving a Prunevirus, further highlighting the importance of recombination as a driving force in Betaflexiviridae evolution. The sequences reported in the present manuscript have been deposited in the GenBank database under accession numbers KM507061-KM504070. PMID:26086395

  15. Phylogeny of the family Moraxellaceae by 16S rDNA sequence analysis, with special emphasis on differentiation of Moraxella species.

    PubMed

    Pettersson, B; Kodjo, A; Ronaghi, M; Uhlén, M; Tønjum, T

    1998-01-01

    Thirty-three strains previously classified into 11 species in the bacterial family Moraxellaceae were subjected to phylogenetic analysis based on 16S rRNA sequences. The family Moraxellaceae formed a distinct clade consisting of four phylogenetic groups as judged from branch lengths, bootstrap values and signature nucleotides. Group I contained the classical moraxellae and strains of the coccal moraxellae, previously known as Branhamella, with 16S rRNA similarity of > or = 95%. A further division of group I into five tentative clusters is discussed. Group II consisted of two strains representing Moraxella atlantae and Moraxella osloensis. These strains were only distantly related to each other (93.4%) and also to the other members of the Moraxellaceae (< or = 93%). Therefore, reasons for reclassification of these species into separate and new genera are discussed. Group III harboured strains of the genus Psychrobacter and strain 752/52 of [Moraxella] phenylpyruvica. This strain of [M.] phenylpyruvica formed an early branch from the group III line of descent. Interestingly, a distant relationship was found between Psychrobacter phenylpyruvicus strain ATCC 23333T (formerly classified as [M.] phenylpyruvica) and [M.] phenylpyruvica strain 752/52, exhibiting less than 96% nucleotide similarity between their 16S rRNA sequences. The establishment of a new genus for [M.] phenylpyruvica strain 752/52 is therefore suggested. Group IV contained only two strains of the genus Acinetobacter. Strategies for the development of diagnostic probes and distinctive sequences for 16S rRNA-based species-specific assays within group I are suggested. Although these findings add to the classificatory placements within the Moraxellaceae, analysis of a more comprehensive selection of strains is still needed to obtain a complete classification system within this family.

  16. Xylella taiwanensis sp. nov., causing pear leaf scorch disease.

    PubMed

    Su, C-C; Deng, W-L; Jan, F-J; Chang, C-J; Huang, H; Shih, H-T; Chen, J

    2016-11-01

    A Gram-stain-negative, nutritionally fastidious bacterium (PLS229T) causing pear leaf scorch was identified in Taiwan and previously grouped into Xylella fastidiosa. Yet, significant variations between PLS229T and Xylellafastidiosa were noted. In this study, PLS229T was evaluated phenotypically and genotypically against representative strains of Xylellafastidiosa, including strains of the currently known subspecies of Xylellafastidiosa, Xylella fastidiosa subsp. multiplex and 'Xylella fastidiosasubsp.pauca'. Because of the difficulty of in vitro culture characterization, emphases were made to utilize the available whole-genome sequence information. The average nucleotide identity (ANI) values, an alternative for DNA-DNA hybridization relatedness, between PLS229T and Xylellafastidiosa were 83.4-83.9 %, significantly lower than the bacterial species threshold of 95 %. In contrast, sequence similarity of 16S rRNA genes was greater than 98 %, higher than the 97 % threshold to justify if two bacterial strains belong to different species. The uniqueness of PLS229T was also evident by observing only about 87 % similarity in the sequence of the 16S-23S internal transcribed spacer (ITS) between PLS229T and strains of Xylellafastidiosa, discovering significant single nucleotide polymorphisms at 18 randomly selected housekeeping gene loci, observing a distinct fatty acid profile for PLS229T compared with Xylellafastidiosa, and PLS229T having different observable phenotypes, such as different susceptibility to antibiotics. A phylogenetic tree derived from 16S rRNA gene sequences showed a distinct PLS229T phyletic lineage positioning it between Xylellafastidiosa and members of the genus Xanthomonas. On the basis of these data, a novel species, Xylella taiwanensis sp. nov. is proposed. The type strain is PLS229T (=BCRC 80915T=JCM 31187T).

  17. The 2007 Nazko, British Columbia, earthquake sequence: Injection of magma deep in the crust beneath the Anahim volcanic belt

    USGS Publications Warehouse

    Cassidy, J.F.; Balfour, N.; Hickson, C.; Kao, H.; White, Rickie; Caplan-Auerbach, J.; Mazzotti, S.; Rogers, Gary C.; Al-Khoubbi, I.; Bird, A.L.; Esteban, L.; Kelman, M.; Hutchinson, J.; McCormack, D.

    2011-01-01

    On 9 October 2007, an unusual sequence of earthquakes began in central British Columbia about 20 km west of the Nazko cone, the most recent (circa 7200 yr) volcanic center in the Anahim volcanic belt. Within 25 hr, eight earthquakes of magnitude 2.3-2.9 occurred in a region where no earthquakes had previously been recorded. During the next three weeks, more than 800 microearthquakes were located (and many more detected), most at a depth of 25-31 km and within a radius of about 5 km. After about two months, almost all activity ceased. The clear P- and S-wave arrivals indicated that these were high-frequency (volcanic-tectonic) earthquakes and the b value of 1.9 that we calculated is anomalous for crustal earthquakes but consistent with volcanic-related events. Analysis of receiver functions at a station immediately above the seismicity indicated a Moho near 30 km depth. Precise relocation of the seismicity using a double-difference method suggested a horizontal migration at the rate of about 0:5 km=d, with almost all events within the lowermost crust. Neither harmonic tremor nor long-period events were observed; however, some spasmodic bursts were recorded and determined to be colocated with the earthquake hypocenters. These observations are all very similar to a deep earthquake sequence recorded beneath Lake Tahoe, California, in 2003-2004. Based on these remarkable similarities, we interpret the Nazko sequence as an indication of an injection of magma into the lower crust beneath the Anahim volcanic belt. This magma injection fractures rock, producing high-frequency, volcanic-tectonic earthquakes and spasmodic bursts.

  18. 'Candidatus Phytoplasma noviguineense', a novel taxon associated with Bogia coconut syndrome and banana wilt disease on the island of New Guinea.

    PubMed

    Miyazaki, Akio; Shigaki, Toshiro; Koinuma, Hiroaki; Iwabuchi, Nozomu; Rauka, Gou Bue; Kembu, Alfred; Saul, Josephine; Watanabe, Kiyoto; Nijo, Takamichi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2018-01-01

    Bogia coconut syndrome (BCS) is one of the lethal yellowing (LY)-type diseases associated with phytoplasma presence that are seriously threatening coconut cultivation worldwide. It has recently emerged, and is rapidly spreading in northern parts of the island of New Guinea. BCS-associated phytoplasmas collected in different regions were compared in terms of 16S rRNA gene sequences, revealing high identity among them represented by strain BCS-Bo R . Comparative analysis of the 16S rRNA gene sequences revealed that BCS-Bo R shared less than a 97.5 % similarity with other species of 'Candidatus Phytoplasma', with a maximum value of 96.08 % (with strain LY; GenBank accession no. U18747). This result indicates the necessity and propriety of a novel taxon for BCS phytoplasmas according to the recommendations of the IRPCM. Phylogenetic analysis was also conducted on 16S rRNA gene sequences, resulting in a monophyletic cluster composed of BCS-Bo R and other LY-associated phytoplasmas. Other phytoplasmas on the island of New Guinea associated with banana wilt and arecanut yellow leaf diseases showed high similarities to BCS-Bo R and were closely related to BCS phytoplasmas. Based on the uniqueness of their 16S rRNA gene sequences, a novel taxon 'Ca.Phytoplasma noviguineense' is proposed for these phytoplasmas found on the island of New Guinea, with strain BCS-Bo R (GenBank accession no. LC228755) as the reference strain. The novel taxon is described in detail, including information on the symptoms of associated diseases and additional genetic features of the secY gene and rp operon.

  19. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  20. Lactobacillus rodentium sp. nov., from the digestive tract of wild rodents.

    PubMed

    Killer, J; Havlík, J; Vlková, E; Rada, V; Pechar, R; Benada, O; Kopečný, J; Kofroňová, O; Sechovcová, H

    2014-05-01

    Three strains of regular, long, Gram-stain-positive bacterial rods were isolated using TPY, M.R.S. and Rogosa agar under anaerobic conditions from the digestive tract of wild mice (Mus musculus). All 16S rRNA gene sequences of these isolates were most similar to sequences of Lactobacillus gasseri ATCC 33323T and Lactobacillus johnsonii ATCC 33200T (97.3% and 97.2% sequence similarities, respectively). The novel strains shared 99.2-99.6% 16S rRNA gene sequence similarities. Type strains of L. gasseri and L. johnsonii were also most related to the newly isolated strains according to rpoA (83.9-84.0% similarities), pheS (84.6-87.8%), atpA (86.2-87.7%), hsp60 (89.4-90.4%) and tuf (92.7-93.6%) gene sequence similarities. Phylogenetic studies based on 16S rRNA, hsp60, rpoA, atpA and pheS gene sequences, other genotypic and many phenotypic characteristics (results of API 50 CHL, Rapid ID 32A and API ZYM biochemical tests; cellular fatty acid profiles; cellular polar lipid profiles; end products of glucose fermentation) showed that these bacterial strains represent a novel species within the genus Lactobacillus. The name Lactobacillus rodentium sp. nov. is proposed to accommodate this group of new isolates. The type strain is MYMRS/TLU1T (=DSM 24759T=CCM 7945T).

  1. Research in Stochastic Processes.

    DTIC Science & Technology

    1982-12-01

    constant high level boundary. References 1. Jurg Husler , Extremie values of non-stationary sequ-ences ard the extr-rmal index, Center for Stochastic...A. Weron, Oct. 82. 20. "Extreme values of non-stationary sequences and the extremal index." Jurg Husler , Oct. 82. 21. "A finitely additive white noise...string model, Y. Miyahara, Carleton University and Nagoya University. Sept. 22 On extremfe values of non-stationary sequences, J. Husler , University of

  2. Adhesive Proteins of Stalked and Acorn Barnacles Display Homology with Low Sequence Similarities

    PubMed Central

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins ‘sticky’ has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7–16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18–26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa). PMID:25295513

  3. Adhesive proteins of stalked and acorn barnacles display homology with low sequence similarities.

    PubMed

    Jonker, Jaimie-Leigh; Abram, Florence; Pires, Elisabete; Varela Coelho, Ana; Grunwald, Ingo; Power, Anne Marie

    2014-01-01

    Barnacle adhesion underwater is an important phenomenon to understand for the prevention of biofouling and potential biotechnological innovations, yet so far, identifying what makes barnacle glue proteins 'sticky' has proved elusive. Examination of a broad range of species within the barnacles may be instructive to identify conserved adhesive domains. We add to extensive information from the acorn barnacles (order Sessilia) by providing the first protein analysis of a stalked barnacle adhesive, Lepas anatifera (order Lepadiformes). It was possible to separate the L. anatifera adhesive into at least 10 protein bands using SDS-PAGE. Intense bands were present at approximately 30, 70, 90 and 110 kilodaltons (kDa). Mass spectrometry for protein identification was followed by de novo sequencing which detected 52 peptides of 7-16 amino acids in length. None of the peptides matched published or unpublished transcriptome sequences, but some amino acid sequence similarity was apparent between L. anatifera and closely-related Dosima fascicularis. Antibodies against two acorn barnacle proteins (ab-cp-52k and ab-cp-68k) showed cross-reactivity in the adhesive glands of L. anatifera. We also analysed the similarity of adhesive proteins across several barnacle taxa, including Pollicipes pollicipes (a stalked barnacle in the order Scalpelliformes). Sequence alignment of published expressed sequence tags clearly indicated that P. pollicipes possesses homologues for the 19 kDa and 100 kDa proteins in acorn barnacles. Homology aside, sequence similarity in amino acid and gene sequences tended to decline as taxonomic distance increased, with minimum similarities of 18-26%, depending on the gene. The results indicate that some adhesive proteins (e.g. 100 kDa) are more conserved within barnacles than others (20 kDa).

  4. High-frequency spectral falloff of earthquakes, fractal dimension of complex rupture, b value, and the scaling of strength on faults

    USGS Publications Warehouse

    Frankel, A.

    1991-01-01

    The high-frequency falloff ??-y of earthquake displacement spectra and the b value of aftershock sequences are attributed to the character of spatially varying strength along fault zones. I assume that the high frequency energy of a main shock is produced by a self-similar distribution of subevents, where the number of subevents with radii greater than R is proportional to R-D, D being the fractal dimension. In the model, an earthquake is composed of a hierarchical set of smaller earthquakes. The static stress drop is parameterized to be proportional to R??, and strength is assumed to be proportional to static stress drop. I find that a distribution of subevents with D = 2 and stress drop independent of seismic moment (?? = 0) produces a main shock with an ??-2 falloff, if the subevent areas fill the rupture area of the main shock. By equating subevents to "islands' of high stress of a random, self-similar stress field on a fault, I relate D to the scaling of strength on a fault, such that D = 2 - ??. Thus D = 2 corresponds to constant stress drop scaling (?? = 0) and scale-invariant fault strength. A self-similar model of aftershock rupture zones on a fault is used to determine the relationship between the b value, the size distribution of aftershock rupture zones, and the scaling of strength on a fault. -from Author

  5. Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.

    PubMed

    Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh

    2018-04-11

    Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.

  6. Comparison of breathhold, navigator-triggered, and free-breathing diffusion-weighted MRI for focal hepatic lesions.

    PubMed

    Choi, Ji Soo; Kim, Myeong-Jin; Chung, Yong Eun; Kim, Kyung Ah; Choi, Jin-Young; Lim, Joon Seok; Park, Mi-Suk; Kim, Ki Whang

    2013-07-01

    To compare the breathhold, navigator-triggered, and free-breathing techniques in diffusion-weighted magnetic resonance imaging (MRI) for the evaluation of focal liver lesions on a 3.0T system. Fifty-two patients (36 men, 16 women; mean age, 56.4 years) with focal liver lesions underwent breathhold, navigator-triggered, and free-breathing diffusion-weighted imaging (DWI) of the liver on a 3.0 Tesla (T) system. All sequences were performed with b values of 50 and 800 s/mm(2) and identical parameters except for signal averages (two for navigator-triggered, one for breathhold, and four for free-breathing) and repetition time (3389 ms for navigator-triggered, 1500 ms for breathhold, and 4400 ms for free-breathing). A total of 74 lesions (50 malignant, 24 benign) were evaluated. The signal-to-noise ratios (SNR) of the liver and lesions, contrast-to-noise ratios (CNR) of each lesion, and ADC values of the liver and lesions were compared for each DWI sequence. The detection sensitivity and characterization accuracy were also compared. The SNRs of the liver and lesions were significantly lower for breathhold DWI than for non-breathhold DWI (navigator-triggered and free-breathing DWI) for all b values. The CNRs of the lesions were also significantly lower for breathhold DWI than for non-breathhold DWI. The ADC values of the liver and focal lesions measured using the three DWI techniques were not significantly different and showed good correlation. For lesion detection and characterization, there were no significant differences between breathhold and non-breathhold DWI. Both breathhold and non-breathhold DWI are comparable for the detection or characterization of focal liver lesions at 3.0T; however, non-breathhold DWI provides higher SNR and CNR than breathhold DWI. In addition, although free-breathing and navigator-triggered DWI sequences show similar performance for 3.0T liver imaging, free-breathing DWI is more time efficient than navigator-triggered DWI. Copyright © 2013 Wiley Periodicals, Inc.

  7. Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

    PubMed

    Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

    2014-01-01

    Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.

  8. Unraveling systematic inventory of Echinops (Asteraceae) with special reference to nrDNA ITS sequence-based molecular typing of Echinops abuzinadianus.

    PubMed

    Ali, M A; Al-Hemaid, F M; Lee, J; Hatamleh, A A; Gyulai, G; Rahman, M O

    2015-10-02

    The present study explored the systematic inventory of Echinops L. (Asteraceae) of Saudi Arabia, with special reference to the molecular typing of Echinops abuzinadianus Chaudhary, an endemic species to Saudi Arabia, based on the internal transcribed spacer (ITS) sequences (ITS1-5.8S-ITS2) of nuclear ribosomal DNA. A sequence similarity search using BLAST and a phylogenetic analysis of the ITS sequence of E. abuzinadianus revealed a high level of sequence similarity with E. glaberrimus DC. (section Ritropsis). The novel primary sequence and the secondary structure of ITS2 of E. abuzinadianus could potentially be used for molecular genotyping.

  9. Description and quantitative modeling of oolitic reservoir analogs within the lower Kansas City Group (Pennsylvanian), southeastern Kansas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    French, J.A.; Watney, W.L.

    A significant number of petroleum reservoirs within the Kansas City Group in central and western Kansas are dominantly oolitic grainstones that cap 10- to 30-m-thick, shallowing-upward, carbonate-rich depositional sequences. Coeval units that occur at and near the surface in southeastern Kansas contain similar porous lithofacies that have been examined in detail via cores, outcrops, and an extensive log database to better understand the equivalent reservoirs. These studies suggest that individual oolitic, reservoir-quality units in the Bethany Falls Limestone (equivalent to the K zone in the subsurface) developed at several relative sea level stands that occurred during development of a highstandmore » systems tract within this depositional sequence. As many as three grain-rich parasequences may occur at a given location. The occurrence of multiple parasequences indicates a relatively complex history of K-zone deposition, which likely resulted in significant effects on reservoir architecture. Two-dimensional forward modeling of this sequence with our interactive, PC-based software has revealed that limited combinations of parameters such as shelf configuration, eustasy, sedimentation rates, and subsidence rates generate strata successions similar to those observed. Sensitivity analysis coupled with regional characterization of processes suggest ranges of values that these parameters could have had during deposition of these units. The ultimate goal of this modeling is to improve our ability to predict facies development in areas of potential and known hydrocarbon accumulations.« less

  10. Tuning Selectivity of Fluorescent Carbon Nanotube-Based Neurotransmitter Sensors.

    PubMed

    Mann, Florian A; Herrmann, Niklas; Meyer, Daniel; Kruss, Sebastian

    2017-06-28

    Detection of neurotransmitters is an analytical challenge and essential to understand neuronal networks in the brain and associated diseases. However, most methods do not provide sufficient spatial, temporal, or chemical resolution. Near-infrared (NIR) fluorescent single-walled carbon nanotubes (SWCNTs) have been used as building blocks for sensors/probes that detect catecholamine neurotransmitters, including dopamine. This approach provides a high spatial and temporal resolution, but it is not understood if these sensors are able to distinguish dopamine from similar catecholamine neurotransmitters, such as epinephrine or norepinephrine. In this work, the organic phase (DNA sequence) around SWCNTs was varied to create sensors with different selectivity and sensitivity for catecholamine neurotransmitters. Most DNA-functionalized SWCNTs responded to catecholamine neurotransmitters, but both dissociation constants ( K d ) and limits of detection were highly dependent on functionalization (sequence). K d values span a range of 2.3 nM (SWCNT-(GC) 15 + norepinephrine) to 9.4 μM (SWCNT-(AT) 15 + dopamine) and limits of detection are mostly in the single-digit nM regime. Additionally, sensors of different SWCNT chirality show different fluorescence increases. Moreover, certain sensors (e.g., SWCNT-(GT) 10 ) distinguish between different catecholamines, such as dopamine and norepinephrine at low concentrations (50 nM). These results show that SWCNTs functionalized with certain DNA sequences are able to discriminate between catecholamine neurotransmitters or to detect them in the presence of interfering substances of similar structure. Such sensors will be useful to measure and study neurotransmitter signaling in complex biological settings.

  11. Cysteine-191 in aspartate aminotransferases appears to be conserved due to the lack of a neutral mutation pathway to the functional equivalent, alanine-191.

    PubMed

    Gloss, L M; Spencer, D E; Kirsch, J F

    1996-02-01

    It was previously suggested that the conserved Cys-191 of aspartate aminotransferases (AATases) is conserved, not because it is essential, but because it is frozen in the sequence, with no neutral corridor to traverse to the similar phenotype of Ala-191 (Gloss et al., Biochemistry 31:32-39, 1992). This hypothesis has now been tested by additional mutations. All possible one-base mutations from Cys were made at position 191. All of these variants display kinetic parameters (kcat and kcat/KM values) that differ from the wild-type enzyme by 30% or more. The non-conserved cysteines that are predominantly Ala in other AATase sequences (Cys-82, Cys-192, and Cys-401) were mutated to Ser to test the corollary that a neutral Cys->Ala corridor does exist for these positions. These Cys->Ser mutations yielded enzymes with wild-type-like kinetic parameters. The pKa values of the internal aldimines of the mutants, Cys-191->Ser, Phe, Tyr, and Trp are higher than that of wild type by 0.6-0.8 pH units. The stabilities to urea denaturation of the Cys-191 mutants are similar to that of wild type, while those of the non-conserved cysteines show greater variation. Examination of the three-dimensional environment of the five cysteines showed that the van der Waals contacts of Cys-191 are more conserved than are those of the non-conserved cysteines. These data provide further support for the above hypothesis.

  12. Streptococcus himalayensis sp. nov., isolated from the respiratory tract of Marmota himalayana.

    PubMed

    Niu, Lina; Lu, Shan; Lai, Xin-He; Hu, Shoukui; Chen, Cuixia; Zhang, Gui; Yang, Jing; Jin, Dong; Wang, Yi; Lan, Ruiting; Lu, Gang; Xie, Yingping; Ye, Changyun; Xu, Jianguo

    2017-02-01

    Five strains of Gram-positive-staining, catalase-negative, coccus-shaped, chain-forming organisms isolated separately from the respiratory tracts of five Marmota himalayana animals in the Qinghai-Tibet Plateau of China were subjected to phenotypic and molecular taxonomic analyses. Comparative analysis of the 16S rRNA gene indicated that these singular organisms represent a new member of the genus Streptococcus, being phylogenetically closest to Streptococcus marmotae DSM 101995T (98.4 % similarity). The groEL, sodA and rpoB sequence analysis showed interspecies similarity values between HTS2T and Streptococcus. marmotae DSM 101995T, its closest phylogenetic relative based on 16S rRNA gene sequences, of 98.2, 78.8 and 93.7 %, respectively. A whole-genome phylogenetic tree built from 82 core genes of genomes from 16 species of the genus Streptococcus validated that HTS2T forms a distinct subline and exhibits specific phylogenetic affinity with S. marmotae. In silico DNA-DNA hybridization of HTS2T showed an estimated DNA reassociation value of 40.5 % with Streptococcus. marmotae DSM 101995T. On the basis of their phenotypic characteristics and phylogenetic findings, it is proposed that the five isolates be classified as representatives of a novel species of the genus Streptococcus, Streptococcus himalayensis sp. nov. The type strain is HTS2T (=DSM 101997T=CGMCC 1.15533T). The genome of Streptococcus himalayensis sp. nov. strain HTS2T contains 2195 genes with a size of 2 275 471 bp and a mean DNA G+C content of 41.3 mol%.

  13. Nucleotide sequence of the Saccharomyces cerevisiae PUT4 proline-permease-encoding gene: similarities between CAN1, HIP1 and PUT4 permeases.

    PubMed

    Vandenbol, M; Jauniaux, J C; Grenson, M

    1989-11-15

    The complete nucleotide (nt) sequence of the PUT4 gene, whose product is required for high-affinity proline active transport in the yeast Saccharomyces cerevisiae, is presented. The sequence contains a single long open reading frame of 1881 nt, encoding a polypeptide with a calculated Mr of 68,795. The predicted protein is strongly hydrophobic and exhibits six potential glycosylation sites. Its hydropathy profile suggests the presence of twelve membrane-spanning regions flanked by hydrophilic N- and C-terminal domains. The N terminus does not resemble signal sequences found in secreted proteins. These features are characteristic of integral membrane proteins catalyzing translocation of ligands across cellular membranes. Protein sequence comparisons indicate strong resemblance to the arginine and histidine permeases of S. cerevisiae, but no marked sequence similarity to the proline permease of Escherichia coli or to other known prokaryotic or eukaryotic transport proteins. The strong similarity between the three yeast amino acid permeases suggests a common ancestor for the three proteins.

  14. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

    PubMed Central

    Borodovsky, M; Rudd, K E; Koonin, E V

    1994-01-01

    The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428

  15. SARNAclust: Semi-automatic detection of RNA protein binding motifs from immunoprecipitation data

    PubMed Central

    Dotu, Ivan; Adamson, Scott I.; Coleman, Benjamin; Fournier, Cyril; Ricart-Altimiras, Emma; Eyras, Eduardo

    2018-01-01

    RNA-protein binding is critical to gene regulation, controlling fundamental processes including splicing, translation, localization and stability, and aberrant RNA-protein interactions are known to play a role in a wide variety of diseases. However, molecular understanding of RNA-protein interactions remains limited; in particular, identification of RNA motifs that bind proteins has long been challenging, especially when such motifs depend on both sequence and structure. Moreover, although RNA binding proteins (RBPs) often contain more than one binding domain, algorithms capable of identifying more than one binding motif simultaneously have not been developed. In this paper we present a novel pipeline to determine binding peaks in crosslinking immunoprecipitation (CLIP) data, to discover multiple possible RNA sequence/structure motifs among them, and to experimentally validate such motifs. At the core is a new semi-automatic algorithm SARNAclust, the first unsupervised method to identify and deconvolve multiple sequence/structure motifs simultaneously. SARNAclust computes similarity between sequence/structure objects using a graph kernel, providing the ability to isolate the impact of specific features through the bulge graph formalism. Application of SARNAclust to synthetic data shows its capability of clustering 5 motifs at once with a V-measure value of over 0.95, while GraphClust achieves only a V-measure of 0.083 and RNAcontext cannot detect any of the motifs. When applied to existing eCLIP sets, SARNAclust finds known motifs for SLBP and HNRNPC and novel motifs for several other RBPs such as AGGF1, AKAP8L and ILF3. We demonstrate an experimental validation protocol, a targeted Bind-n-Seq-like high-throughput sequencing approach that relies on RNA inverse folding for oligo pool design, that can validate the components within the SLBP motif. Finally, we use this protocol to experimentally interrogate the SARNAclust motif predictions for protein ILF3. Our results support a newly identified partially double-stranded UUUUUGAGA motif similar to that known for the splicing factor HNRNPC. PMID:29596423

  16. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  17. String Mining in Bioinformatics

    NASA Astrophysics Data System (ADS)

    Abouelhoda, Mohamed; Ghanem, Moustafa

    Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].

  18. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  19. Lysobacter spongiicola sp. nov., isolated from a deep-sea sponge.

    PubMed

    Romanenko, Lyudmila A; Uchino, Masataka; Tanaka, Naoto; Frolova, Galina M; Mikhailov, Valery V

    2008-02-01

    An aerobic, Gram-negative bacterium, strain KMM 329(T), was isolated from a deep-sea sponge specimen from the Philippine Sea and subjected to a polyphasic taxonomic investigation. Comparative 16S rRNA gene sequence analysis showed that strain KMM 329(T) clustered with the species of the genus Lysobacter. The highest level of 16S rRNA gene sequence similarity (97.0 %) was found with respect to Lysobacter concretionis KCTC 12205(T); lower values (96.4-95.2 %) were obtained with respect to the other recognized Lysobacter species. The value for DNA-DNA relatedness between strain KMM 329(T) and L. concretionis KCTC 12205(T) was 47 %. Branched fatty acids 16 : 0 iso, 15 : 0 iso, 11 : 0 iso 3-OH and 17 : 1 iso were found to be predominant. Strain KMM 329(T) had a DNA G+C content of 69.0 mol%. On the basis of the phenotypic, chemotaxonomic, DNA-DNA hybridization and phylogenetic data, strain KMM 329(T) represents a novel species of the genus Lysobacter, for which the name Lysobacter spongiicola sp. nov. is proposed. The type strain is KMM 329(T) (=NRIC 0728(T) =JCM 14760(T)).

  20. Nitrogen-fixing and cellulose-producing Gluconacetobacter kombuchae sp. nov., isolated from Kombucha tea.

    PubMed

    Dutta, Debasree; Gachhui, Ratan

    2007-02-01

    A few members of the family Acetobacteraceae are cellulose-producers, while only six members fix nitrogen. Bacterial strain RG3T, isolated from Kombucha tea, displays both of these characteristics. A high bootstrap value in the 16S rRNA gene sequence-based phylogenetic analysis supported the position of this strain within the genus Gluconacetobacter, with Gluconacetobacter hansenii LMG 1527T as its nearest neighbour (99.1 % sequence similarity). It could utilize ethanol, fructose, arabinose, glycerol, sorbitol and mannitol, but not galactose or xylose, as sole sources of carbon. Single amino acids such as L-alanine, L-cysteine and L-threonine served as carbon and nitrogen sources for growth of strain RG3T. Strain RG3T produced cellulose in both nitrogen-free broth and enriched medium. The ubiquinone present was Q-10 and the DNA base composition was 55.8 mol% G+C. It exhibited low values of 5.2-27.77 % DNA-DNA relatedness to the type strains of related gluconacetobacters, which placed it within a separate taxon, for which the name Gluconacetobacter kombuchae sp. nov. is proposed, with the type strain RG3T (=LMG 23726T=MTCC 6913T).

  1. Burkholderia monticola sp. nov., isolated from mountain soil.

    PubMed

    Baek, Inwoo; Seo, Boram; Lee, Imchang; Yi, Hana; Chun, Jongsik

    2015-02-01

    An ivory/yellow, Gram-stain-negative, short-rod-shaped, aerobic bacterial strain, designated JC2948(T), was isolated from a soil sample taken from Gwanak Mountain, Republic of Korea. 16S rRNA gene sequence analysis indicated that strain JC2948(T) belongs to the genus Burkholderia. The test strain showed highest sequence similarities to Burkholderia tropica LMG 22274(T) (97.6 %), Burkholderia acidipaludis NBRC 101816(T) (97.5 %), Burkholderia tuberum LMG 21444(T) (97.5 %), Burkholderia sprentiae LMG 27175(T) (97.4 %), Burkholderia terricola LMG 20594(T) (97.3 %) and Burkholderia diazotrophica LMG 26031(T) (97.1 %). Based on average nucleotide identity (ANI) values, the new isolate represents a novel genomic species as it shows less than 90 % ANI values with other closely related species. Also, other phylosiological and biochemical comparisons allowed the phenotypic differentiation of strain JC2948(T) from other members of the genus Burkholderia. Therefore, we suggest that this strain should be classified as the type strain of a novel species of the genus Burkholderia. The name Burkholderia monticola sp. nov. (type strain, JC2948(T) = JCM 19904(T) = KACC 17924(T)) is proposed. © 2015 IUMS.

  2. Selection, Characterization and Interaction Studies of a DNA Aptamer for the Detection of Bifidobacterium bifidum

    PubMed Central

    Hu, Lujun; Wang, Linlin; Lu, Wenwei; Zhao, Jianxin; Zhang, Hao; Chen, Wei

    2017-01-01

    A whole-bacterium-based SELEX (Systematic Evolution of Ligands by Exponential Enrichment) procedure was adopted in this study for the selection of an ssDNA aptamer that binds to Bifidobacterium bifidum. After 12 rounds of selection targeted against B. bifidum, 30 sequences were obtained and divided into seven families according to primary sequence homology and similarity of secondary structure. Four FAM (fluorescein amidite) labeled aptamer sequences from different families were selected for further characterization by flow cytometric analysis. The results reveal that the aptamer sequence CCFM641-5 demonstrated high-affinity and specificity for B. bifidum compared with the other sequences tested, and the estimated Kd value was 10.69 ± 0.89 nM. Additionally, sequence truncation experiments of the aptamer CCFM641-5 led to the conclusion that the 5′-primer and 3′-primer binding sites were essential for aptamer-target binding. In addition, the possible component of the target B. bifidum, bound by the aptamer CCFM641-5, was identified as a membrane protein by treatment with proteinase. Furthermore, to prove the potential application of the aptamer CCFM641-5, a colorimetric bioassay of the sandwich-type structure was used to detect B. bifidum. The assay had a linear range of 104 to 107 cfu/mL (R2 = 0.9834). Therefore, the colorimetric bioassay appears to be a promising method for the detection of B. bifidum based on the aptamer CCFM641-5. PMID:28441340

  3. The Swiss-Army-Knife Approach to the Nearly Automatic Analysis for Microearthquake Sequences.

    NASA Astrophysics Data System (ADS)

    Kraft, T.; Simon, V.; Tormann, T.; Diehl, T.; Herrmann, M.

    2017-12-01

    Many Swiss earthquake sequence have been studied using relative location techniques, which often allowed to constrain the active fault planes and shed light on the tectonic processes that drove the seismicity. Yet, in the majority of cases the number of located earthquakes was too small to infer the details of the space-time evolution of the sequences, or their statistical properties. Therefore, it has mostly been impossible to resolve clear patterns in the seismicity of individual sequences, which are needed to improve our understanding of the mechanisms behind them. Here we present a nearly automatic workflow that combines well-established seismological analysis techniques and allows to significantly improve the completeness of detected and located earthquakes of a sequence. We start from the manually timed routine catalog of the Swiss Seismological Service (SED), which contains the larger events of a sequence. From these well-analyzed earthquakes we dynamically assemble a template set and perform a matched filter analysis on the station with: the best SNR for the sequence; and a recording history of at least 10-15 years, our typical analysis period. This usually allows us to detect events several orders of magnitude below the SED catalog detection threshold. The waveform similarity of the events is then further exploited to derive accurate and consistent magnitudes. The enhanced catalog is then analyzed statistically to derive high-resolution time-lines of the a- and b-value and consequently the occurrence probability of larger events. Many of the detected events are strong enough to be located using double-differences. No further manual interaction is needed; we simply time-shift the arrival-time pattern of the detecting template to the associated detection. Waveform similarity assures a good approximation of the expected arrival-times, which we use to calculate event-pair arrival-time differences by cross correlation. After a SNR and cycle-skipping quality check these are directly fed into hypoDD. Using this procedure we usually improve the number of well-relocated events by a factor 2-5. We demonstrate the successful application of the workflow at the example of natural sequences in Switzerland and present first results of the advanced analysis the was possible with the enhanced catalogs.

  4. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity.

    PubMed

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-09-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. © 2015 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.

  5. Comparison of topological clustering within protein networks using edge metrics that evaluate full sequence, full structure, and active site microenvironment similarity

    PubMed Central

    Leuthaeuser, Janelle B; Knutson, Stacy T; Kumar, Kiran; Babbitt, Patricia C; Fetrow, Jacquelyn S

    2015-01-01

    The development of accurate protein function annotation methods has emerged as a major unsolved biological problem. Protein similarity networks, one approach to function annotation via annotation transfer, group proteins into similarity-based clusters. An underlying assumption is that the edge metric used to identify such clusters correlates with functional information. In this contribution, this assumption is evaluated by observing topologies in similarity networks using three different edge metrics: sequence (BLAST), structure (TM-Align), and active site similarity (active site profiling, implemented in DASP). Network topologies for four well-studied protein superfamilies (enolase, peroxiredoxin (Prx), glutathione transferase (GST), and crotonase) were compared with curated functional hierarchies and structure. As expected, network topology differs, depending on edge metric; comparison of topologies provides valuable information on structure/function relationships. Subnetworks based on active site similarity correlate with known functional hierarchies at a single edge threshold more often than sequence- or structure-based networks. Sequence- and structure-based networks are useful for identifying sequence and domain similarities and differences; therefore, it is important to consider the clustering goal before deciding appropriate edge metric. Further, conserved active site residues identified in enolase and GST active site subnetworks correspond with published functionally important residues. Extension of this analysis yields predictions of functionally determinant residues for GST subgroups. These results support the hypothesis that active site similarity-based networks reveal clusters that share functional details and lay the foundation for capturing functionally relevant hierarchies using an approach that is both automatable and can deliver greater precision in function annotation than current similarity-based methods. PMID:26073648

  6. A systematic evaluation of three different cardiac T2-mapping sequences at 1.5 and 3T in healthy volunteers.

    PubMed

    Baeßler, Bettina; Schaarschmidt, Frank; Stehning, Christian; Schnackenburg, Bernhard; Maintz, David; Bunck, Alexander C

    2015-11-01

    Previous studies showed that myocardial T2 relaxation times measured by cardiac T2-mapping vary significantly depending on sequence and field strength. Therefore, a systematic comparison of different T2-mapping sequences and the establishment of dedicated T2 reference values is mandatory for diagnostic decision-making. Phantom experiments using gel probes with a range of different T1 and T2 times were performed on a clinical 1.5T and 3T scanner. In addition, 30 healthy volunteers were examined at 1.5 and 3T in immediate succession. In each examination, three different T2-mapping sequences were performed at three short-axis slices: Multi Echo Spin Echo (MESE), T2-prepared balanced SSFP (T2prep), and Gradient Spin Echo with and without fat saturation (GraSEFS/GraSE). Segmented T2-Maps were generated according to the AHA 16-segment model and statistical analysis was performed. Significant intra-individual differences between mean T2 times were observed for all sequences. In general, T2prep resulted in lowest and GraSE in highest T2 times. A significant variation with field strength was observed for mean T2 in phantom as well as in vivo, with higher T2 values at 1.5T compared to 3T, regardless of the sequence used. Segmental T2 values for each sequence at 1.5 and 3T are presented. Despite a careful selection of sequence parameters and volunteers, significant variations of the measured T2 values were observed between field strengths, MR sequences and myocardial segments. Therefore, we present segmental T2 values for each sequence at 1.5 and 3T with the inherent potential to serve as reference values for future studies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  7. A path-based measurement for human miRNA functional similarities using miRNA-disease associations

    NASA Astrophysics Data System (ADS)

    Ding, Pingjian; Luo, Jiawei; Xiao, Qiu; Chen, Xiangtao

    2016-09-01

    Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.

  8. Fixed recurrence and slip models better predict earthquake behavior than the time- and slip-predictable models 1: repeating earthquakes

    USGS Publications Warehouse

    Rubinstein, Justin L.; Ellsworth, William L.; Chen, Kate Huihsuan; Uchida, Naoki

    2012-01-01

    The behavior of individual events in repeating earthquake sequences in California, Taiwan and Japan is better predicted by a model with fixed inter-event time or fixed slip than it is by the time- and slip-predictable models for earthquake occurrence. Given that repeating earthquakes are highly regular in both inter-event time and seismic moment, the time- and slip-predictable models seem ideally suited to explain their behavior. Taken together with evidence from the companion manuscript that shows similar results for laboratory experiments we conclude that the short-term predictions of the time- and slip-predictable models should be rejected in favor of earthquake models that assume either fixed slip or fixed recurrence interval. This implies that the elastic rebound model underlying the time- and slip-predictable models offers no additional value in describing earthquake behavior in an event-to-event sense, but its value in a long-term sense cannot be determined. These models likely fail because they rely on assumptions that oversimplify the earthquake cycle. We note that the time and slip of these events is predicted quite well by fixed slip and fixed recurrence models, so in some sense they are time- and slip-predictable. While fixed recurrence and slip models better predict repeating earthquake behavior than the time- and slip-predictable models, we observe a correlation between slip and the preceding recurrence time for many repeating earthquake sequences in Parkfield, California. This correlation is not found in other regions, and the sequences with the correlative slip-predictable behavior are not distinguishable from nearby earthquake sequences that do not exhibit this behavior.

  9. Modestobacter caceresii sp. nov., novel actinobacteria with an insight into their adaptive mechanisms for survival in extreme hyper-arid Atacama Desert soils.

    PubMed

    Busarakam, Kanungnid; Bull, Alan T; Trujillo, Martha E; Riesco, Raul; Sangal, Vartul; van Wezel, Gilles P; Goodfellow, Michael

    2016-06-01

    A polyphasic study was designed to determine the taxonomic provenance of three Modestobacter strains isolated from an extreme hyper-arid Atacama Desert soil. The strains, isolates KNN 45-1a, KNN 45-2b(T) and KNN 45-3b, were shown to have chemotaxonomic and morphological properties in line with their classification in the genus Modestobacter. The isolates had identical 16S rRNA gene sequences and formed a branch in the Modestobacter gene tree that was most closely related to the type strain of Modestobacter marinus (99.6% similarity). All three isolates were distinguished readily from Modestobacter type strains by a broad range of phenotypic properties, by qualitative and quantitative differences in fatty acid profiles and by BOX fingerprint patterns. The whole genome sequence of isolate KNN 45-2b(T) showed 89.3% average nucleotide identity, 90.1% (SD: 10.97%) average amino acid identity and a digital DNA-DNA hybridization value of 42.4±3.1 against the genome sequence of M. marinus DSM 45201(T), values consistent with its assignment to a separate species. On the basis of all of these data, it is proposed that the isolates be assigned to the genus Modestobacter as Modestobacter caceresii sp. nov. with isolate KNN 45-2b(T) (CECT 9023(T)=DSM 101691(T)) as the type strain. Analysis of the whole-genome sequence of M. caceresii KNN 45-2b(T), with 4683 open reading frames and a genome size of ∽4.96Mb, revealed the presence of genes and gene-clusters that encode for properties relevant to its adaptability to harsh environmental conditions prevalent in extreme hyper arid Atacama Desert soils. Copyright © 2016. Published by Elsevier GmbH.

  10. The Construction of Impossibility: A Logic-Based Analysis of Conjuring Tricks

    PubMed Central

    Smith, Wally; Dignum, Frank; Sonenberg, Liz

    2016-01-01

    Psychologists and cognitive scientists have long drawn insights and evidence from stage magic about human perceptual and attentional errors. We present a complementary analysis of conjuring tricks that seeks to understand the experience of impossibility that they produce. Our account is first motivated by insights about the constructional aspects of conjuring drawn from magicians' instructional texts. A view is then presented of the logical nature of impossibility as an unresolvable contradiction between a perception-supported belief about a situation and a memory-supported expectation. We argue that this condition of impossibility is constructed not simply through misperceptions and misattentions, but rather it is an outcome of a trick's whole structure of events. This structure is conceptualized as two parallel event sequences: an effect sequence that the spectator is intended to believe; and a method sequence that the magician understands as happening. We illustrate the value of this approach through an analysis of a simple close-up trick, Martin Gardner's Turnabout. A formalism called propositional dynamic logic is used to describe some of its logical aspects. This elucidates the nature and importance of the relationship between a trick's effect sequence and its method sequence, characterized by the careful arrangement of four evidence relationships: similarity, perceptual equivalence, structural equivalence, and congruence. The analysis further identifies two characteristics of magical apparatus that enable the construction of apparent impossibility: substitutable elements and stable occlusion. PMID:27378959

  11. Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

    PubMed Central

    2011-01-01

    Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684

  12. Dose-Response Analysis of RNA-Seq Profiles in Archival ...

    EPA Pesticide Factsheets

    Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses using RNA-sequencing in paired FFPE and frozen (FROZ) samples from two archival studies in mice, one 20 years old. Experimental treatments included 3 different doses of di(2-ethylhexyl)phthalate or dichloroacetic acid for the recently archived and older studies, respectively. Total RNA was ribo-depleted and sequenced using the Illumina HiSeq platform. In the recently archived study, FFPE samples had 35% lower total counts compared to FROZ samples but high concordance in fold-change values of differentially expressed genes (DEGs) (r2 = 0.99), highly enriched pathways (90% overlap with FROZ), and benchmark dose estimates for preselected target genes (2% difference vs FROZ). In contrast, older FFPE samples had markedly lower total counts (3% of FROZ) and poor concordance in global DEGs and pathways. However, counts from FFPE and FROZ samples still positively correlated (r2 = 0.84 across all transcripts) and showed comparable dose responses for more highly expressed target genes. These findings highlight potential applications and issues in using RNA-sequencing data from FFPE samples. Recently archived FFPE samples were highly similar to FROZ samples in sequencing q

  13. Temporal variation of aftershocks by means of multifractal characterization of their inter-event time and cluster analysis

    NASA Astrophysics Data System (ADS)

    Figueroa-Soto, A.; Zuñiga, R.; Marquez-Ramirez, V.; Monterrubio-Velasco, M.

    2017-12-01

    . The inter-event time characteristics of seismic aftershock sequences can provide important information to discern stages in the aftershock generation process. In order to investigate whether separate dynamic stages can be identified, (1) aftershock series after selected earthquake mainshocks, which took place at similar tectonic regimes were analyzed. To this end we selected two well-defined aftershock sequences from New Zealand and one aftershock sequence for Mexico, we (2) analyzed the fractal behavior of the logarithm of inter-event times (also called waiting times) of aftershocks by means of Holdeŕs exponent, and (3) their magnitude and spatial location based on a methodology proposed by Zaliapin and Ben Zion [2011] which accounts for the clustering properties of the sequence. In general, more than two coherent process stages can be identified following the main rupture, evidencing a type of "cascade" process which precludes implying a single generalized power law even though the temporal rate and average fractal character appear to be unique (as in a single Omorís p value). We found that aftershock processes indeed show multi-fractal characteristics, which may be related to different stages in the process of diffusion, as seen in the temporary-spatial distribution of aftershocks. Our method provides a way of defining the onset of the return to seismic background activity and the end of the main aftershock sequence.

  14. Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

    PubMed

    Seal, B S; Neill, J D; Ridpath, J F

    1994-07-01

    Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.

  15. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  16. GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

    PubMed

    Issac, Biju; Raghava, G P S

    2002-09-01

    Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.

  17. Multiple alignment-free sequence comparison

    PubMed Central

    Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

    2013-01-01

    Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418

  18. A look at the effect of sequence complexity on pressure destabilisation of DNA polymers.

    PubMed

    Rayan, Gamal; Macgregor, Robert B

    2015-04-01

    Our previous studies on the helix-coil transition of double-stranded DNA polymers have demonstrated that molar volume change (ΔV) accompanying the thermally-induced transition can be positive or negative depending on the experimental conditions, that the pressure-induced transition is more cooperative than the heat-induced transition [Rayan and Macgregor, J Phys Chem B2005, 109, 15558-15565], and that the pressure-induced transition does not occur in the absence of water [Rayan and Macgregor, Biophys Chem, 2009, 144, 62-66]. Additionally, we have shown that ΔV values obtained by pressure-dependent techniques differ from those obtained by ambient pressure techniques such as PPC [Rayan et al. J Phys Chem B2009, 113, 1738-1742] thus shedding light on the effects of pressure on DNA polymers. Herein, we examine the effect of sequence complexity, and hence cooperativity on pressure destabilisation of DNA polymers. Working with Clostridium perfringes DNA under conditions such that the estimated ΔV of the helix-coil transition corresponds to -1.78 mL/mol (base pair) at atmospheric pressure, we do not observe the pressure-induced helix-coil transition of this DNA polymer, whereas synthetic copolymers poly[d(A-T)] and poly[d(I-C)] undergo cooperative pressure-induced transitions at similar ΔV values. We hypothesise that the reason for the lack of pressure-induced helix-coil transition of C. perfringens DNA under these experimental conditions lies in its sequence complexity. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Detection of tuberculosis drug resistance: a comparison by Mycobacterium tuberculosis MLPA assay versus Genotype®MTBDRplus.

    PubMed

    Santos, Paula Fernanda Gonçalves Dos; Costa, Elis Regina Dalla; Ramalho, Daniela M; Rossetti, Maria Lucia; Barcellos, Regina Bones; Nunes, Luciana de Souza; Esteves, Leonardo Souza; Rodenbusch, Rodrigo; Anthony, Richard M; Bergval, Indra; Sengstake, Sarah; Viveiros, Miguel; Kritski, Afrânio; Oliveira, Martha M

    2017-06-01

    To cope with the emergence of multidrug-resistant tuberculosis (MDR-TB), new molecular methods that can routinely be used to screen for a wide range of drug resistance related genetic markers in the Mycobacterium tuberculosis genome are urgently needed. To evaluate the performance of multiplex ligaton-dependent probe amplification (MLPA) against Genotype® MTBDRplus to detect resistance to isoniazid (INHr) and rifampicin (RIFr). 96 culture isolates characterised for identification, drug susceptibility testing (DST) and sequencing of rpoB, katG, and inhA genes were evaluated by the MLPA and Genotype®MTBDRplus assays. With sequencing as a reference standard, sensitivity (SE) to detect INHr was 92.8% and 85.7%, and specificity (SP) was 100% and 97.5%, for MLPA and Genotype®MTBDRplus, respectively. In relation to RIFr, SE was 87.5% and 100%, and SP was 100% and 98.8%, respectively. Kappa value was identical between Genotype®MTBDRplus and MLPA compared with the standard DST and sequencing for detection of INHr [0.83 (0.75-0.91)] and RIFr [0.93 (0.88-0.98)]. Compared to Genotype®MTBDRplus, MLPA showed similar sensitivity to detect INH and RIF resistance. The results obtained by the MLPA and Genotype®MTBDRplus assays indicate that both molecular tests can be used for the rapid detection of drug-resistant TB with high accuracy. MLPA has the added value of providing information on the circulating M. tuberculosis lineages.

  20. HUBBLE SPACE TELESCOPE OBSERVATIONS OF THE NUCLEUS OF COMET C/2012 S1 (ISON)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lamy, Philippe L.; Toth, Imre; Weaver, Harold A., E-mail: philippe.lamy@lam.fr

    2014-10-10

    We report on the analysis of several sequences of broadband visible images of comet C/2012 S1 (ISON) taken with the Wide Field Camera 3 of the Hubble Space Telescope on 2013 April 10, May 8, October 9, and November 1 in an attempt to detect and characterize its nucleus. Whereas the overwhelming coma precluded the detection of the nucleus in the first two sequences, the contrast was sufficient in early October to unambiguously retrieve the signal from the nucleus. Two images taken within a few minutes led to similar V magnitudes for the nucleus of 21.97 and 22.0 with amore » 1σ uncertainty of 0.065. Assuming a standard value for the geometric albedo (0.04) and a linear phase function with a coefficient of 0.04 mag deg{sup –1}, these V values imply that the nucleus radius is 0.68 ± 0.02 km. Although this result does depend on these two assumptions, we argue that the radius most likely lies in the range 0.6-0.9 km. This result is consistent with the constraints derived from the water production rates reported by Combi et al. The last sequence of images in 2013 November revealed temporal variation of the innermost coma. If attributed to a single rotating jet, this coma brightness variation suggests the rotational period of the nucleus may be close to ∼10.4 hr.« less

  1. Microbial community structure in three deep-sea carbonate crusts.

    PubMed

    Heijs, S K; Aloisi, G; Bouloubassi, I; Pancost, R D; Pierre, C; Sinninghe Damsté, J S; Gottschal, J C; van Elsas, J D; Forney, L J

    2006-10-01

    Carbonate crusts in marine environments can act as sinks for carbon dioxide. Therefore, understanding carbonate crust formation could be important for understanding global warming. In the present study, the microbial communities of three carbonate crust samples from deep-sea mud volcanoes in the eastern Mediterranean were characterized by sequencing 16S ribosomal RNA (rRNA) genes amplified from DNA directly retrieved from the samples. In combination with the mineralogical composition of the crusts and lipid analyses, sequence data were used to assess the possible role of prokaryotes in crust formation. Collectively, the obtained data showed the presence of highly diverse communities, which were distinct in each of the carbonate crusts studied. Bacterial 16S rRNA gene sequences were found in all crusts and the majority was classified as alpha-, gamma-, and delta- Proteobacteria. Interestingly, sequences of Proteobacteria related to Halomonas and Halovibrio sp., which can play an active role in carbonate mineral formation, were present in all crusts. Archaeal 16S rRNA gene sequences were retrieved from two of the crusts studied. Several of those were closely related to archaeal sequences of organisms that have previously been linked to the anaerobic oxidation of methane (AOM). However, the majority of archaeal sequences were not related to sequences of organisms known to be involved in AOM. In combination with the strongly negative delta 13C values of archaeal lipids, these results open the possibility that organisms with a role in AOM may be more diverse within the Archaea than previously suggested. Different communities found in the crusts could carry out similar processes that might play a role in carbonate crust formation.

  2. Rapid and accurate taxonomic classification of insect (class Insecta) cytochrome c oxidase subunit 1 (COI) DNA barcode sequences using a naïve Bayesian classifier

    PubMed Central

    Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad

    2014-01-01

    Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.

  3. A discrete artificial bee colony algorithm for detecting transcription factor binding sites in DNA sequences.

    PubMed

    Karaboga, D; Aslan, S

    2016-04-27

    The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.

  4. Stochastic precision analysis of 2D cardiac strain estimation in vivo

    NASA Astrophysics Data System (ADS)

    Bunting, E. A.; Provost, J.; Konofagou, E. E.

    2014-11-01

    Ultrasonic strain imaging has been applied to echocardiography and carries great potential to be used as a tool in the clinical setting. Two-dimensional (2D) strain estimation may be useful when studying the heart due to the complex, 3D deformation of the cardiac tissue. Increasing the framerate used for motion estimation, i.e. motion estimation rate (MER), has been shown to improve the precision of the strain estimation, although maintaining the spatial resolution necessary to view the entire heart structure in a single heartbeat remains challenging at high MERs. Two previously developed methods, the temporally unequispaced acquisition sequence (TUAS) and the diverging beam sequence (DBS), have been used in the past to successfully estimate in vivo axial strain at high MERs without compromising spatial resolution. In this study, a stochastic assessment of 2D strain estimation precision is performed in vivo for both sequences at varying MERs (65, 272, 544, 815 Hz for TUAS; 250, 500, 1000, 2000 Hz for DBS). 2D incremental strains were estimated during left ventricular contraction in five healthy volunteers using a normalized cross-correlation function and a least-squares strain estimator. Both sequences were shown capable of estimating 2D incremental strains in vivo. The conditional expected value of the elastographic signal-to-noise ratio (E(SNRe|ɛ)) was used to compare strain estimation precision of both sequences at multiple MERs over a wide range of clinical strain values. The results here indicate that axial strain estimation precision is much more dependent on MER than lateral strain estimation, while lateral estimation is more affected by strain magnitude. MER should be increased at least above 544 Hz to avoid suboptimal axial strain estimation. Radial and circumferential strain estimations were influenced by the axial and lateral strain in different ways. Furthermore, the TUAS and DBS were found to be of comparable precision at similar MERs.

  5. Genes tagging and molecular diversity of red rot susceptible/tolerant sugarcane hybrids using c-DNA and unigene derived markers.

    PubMed

    Singh, R K; Singh, R B; Singh, S P; Sharma, M L

    2012-04-01

    Sugarcane is an important international commodity as a valuable agricultural crop especially in tropical and subtropical countries. Two bulked DNA used to screen polymorphic primers from commercial hybrids (varieties) with moderately resistant and highly susceptible to red rot disease. Among 145 simple sequence repeat and unigene primers screened, 37 (25%) were found to be highly robust and polymorphic with Polymorphism Information Content values ranging from 0.50 to 1.00 with the mean value of 0.82. Among these microsatellites, twenty one were used in the study of genetic relationships and marker identification in sugarcane varieties for red rot resistance. A total of 105 polymorphic DNA bands were identified, with their fragment size ranging from 54 to 1,280 bp. Jaccard's similarity coefficient value recorded between closely related hybrids was 0.986 while lowest coefficient value of 0.341 was detected with distantly related hybrids. The average similarity coefficient among these hybrids was 0.663. Cluster analysis resulted in a dendrogram with two major clusters separating the moderately resistant varieties from highly susceptible varieties. Three group specific fragments amplified by unigene Saccharum microsatellite primers viz; two markers UGSM316(850) and UGSM316(60) were closely associated with moderately resistant varieties by appearing bands in this region but the bands were absent in highly susceptible varieties. Similarly UGSM316(400) marker was tightly linked with highly susceptible varieties by amplifying uniformly in sugarcane varieties showing highly susceptible reaction to red rot but it was absent in moderately resistant varietal groups. Validation of red rot resistance/susceptibility associated markers on a group of different mapping populations for red rot resistant/susceptible traits is in progress.

  6. The tapeworm Atractolytocestus tenuicollis (Cestoda: Caryophyllidea)--a sister species or ancestor of an invasive A. huronensis?

    PubMed

    Králová-Hromadová, Ivica; Štefka, Jan; Bazsalovicsová, Eva; Bokorová, Silvia; Oros, Mikuláš

    2013-10-01

    Atractolytocestus tenuicollis (Li, 1964) Xi, Wang, Wu, Gao et Nie, 2009 is a monozoic, non-segmented tapeworm of the order Caryophyllidea, parasitizing exclusively common carp (Cyprinus carpio L.). In the current work, the first molecular data, in particular complete ribosomal internal transcribed spacer 2 (ITS2) and partial mitochondrial cytochrome c oxidase subunit I (cox1) on A. tenuicollis from Niushan Lake, Wuhan, China, are provided. In order to evaluate molecular interrelationships within Atractolytocestus, the data on A. tenuicollis were compared with relevant data on two other congeners, Atractolytocestus huronensis and Atractolytocestus sagittatus. Divergent intragenomic copies (ITS2 paralogues) were detected in the ITS2 ribosomal spacer of A. tenuicollis; the same phenomenon has previously been observed also in two other congeners. ITS2 structure of A. tenuicollis was very similar to that of A. huronensis from Slovakia, USA and UK; overall pairwise sequence identity was 91.7-95.2%. On the other hand, values of sequence identity between A. tenuicollis and A. sagittatus were lower, 69.7-70.9%. Cox1 sequence, analysed in five A. tenuicollis individuals, were 100 % identical and no intraspecific variation was observed. Comparison of A. tenuicollis cox1 with respective sequences of two other Atractolytocestus species showed that the mitochondrial haplotype found in Chinese A. tenuicollis is structurally specific (haplotype 4; Ha4) and differs from all so far determined Atractolytocestus haplotypes (Ha1 and Ha2 for A. huronensis; Ha3 for A. sagittatus). Pairwise sequence identity between A. tenuicollis cox1 haplotype and remaining three haplotypes followed the same pattern as in ITS2. The nucleotide and amino acide (aa) sequence comparison with A. huronensis Ha1 and Ha2 revealed higher sequence identity, 90.3-90.8% (96.9% in aa), while lower values were achieved between A. tenuicollis haplotype and Ha3 of Japanese A. sagittatus-75.2 % (81.9 % in aa). The phylogenetic analyses using cox1, ITS2 and combined cox1 + ITS2 sequences revealed close genetic interrelationship between A. tenuicollis and A. huronensis. Independently of a type of analysis and DNA region used, the topology of obtained trees was always identical; A. tenuicollis formed separate clade with A. huronensis forming a closely related sister group.

  7. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery.

    PubMed

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A; Sahgal, Arjun

    2011-11-21

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R² = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  8. Contrast-enhanced 3-dimensional SPACE versus MP-RAGE for the detection of brain metastases: considerations with a 32-channel head coil.

    PubMed

    Reichert, Miriam; Morelli, John N; Runge, Val M; Tao, Ai; von Ritschl, Ruediger; von Ritschl, Andreas; Padua, Abraham; Dix, James E; Marra, Michael J; Schoenberg, Stefan O; Attenberger, Ulrike I

    2013-01-01

    The aim of this study was to compare the detection of brain metastases at 3 T using a 32-channel head coil with 2 different 3-dimensional (3D) contrast-enhanced sequences, a T1-weighted fast spin-echo-based (SPACE; sampling perfection with application-optimized contrasts using different flip angle evolutions) sequence and a conventional magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. Seventeen patients with 161 brain metastases were examined prospectively using both SPACE and MP-RAGE sequences on a 3-T magnetic resonance system. Eight healthy volunteers were similarly examined for determination of signal-to-noise ratio (SNR) values. Parameters were adjusted to equalize acquisition times between the sequences (3 minutes and 30 seconds). The order in which sequences were performed was randomized. Two blinded board-certified neuroradiologists evaluated the number of detectable metastatic lesions with each sequence relative to a criterion standard reading conducted at the Gamma Knife facility by a neuroradiologist with access to all clinical and imaging data. In the volunteer assessment with SPACE and MP-RAGE, SNR (10.3 ± 0.8 vs 7.7 ± 0.7) and contrast-to-noise ratio (0.8 ± 0.2 vs 0.5 ± 0.1) were statistically significantly greater with the SPACE sequence (P < 0.05). Overall, lesion detection was markedly improved with the SPACE sequence (99.1% of lesions for reader 1 and 96.3% of lesions for reader 2) compared with the MP-RAGE sequence (73.6% of lesions for reader 1 and 68.5% of lesions for reader 2; P < 0.01). A 3D T1-weighted fast spin echo sequence (SPACE) improves detection of metastatic lesions relative to 3D T1-weighted gradient-echo-based scan (MP-RAGE) imaging when implemented with a 32-channel head coil at identical scan acquisition times (3 minutes and 30 seconds).

  9. Metabolic network prediction through pairwise rational kernels.

    PubMed

    Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

    2014-09-26

    Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.

  10. Systemic Lupus Erythematosus: Molecular Mimicry between Anti-dsDNA CDR3 Idiotype, Microbial and Self Peptides-As Antigens for Th Cells.

    PubMed

    Aas-Hanssen, Kristin; Thompson, Keith M; Bogen, Bjarne; Munthe, Ludvig A

    2015-01-01

    Systemic lupus erythematosus (SLE) is marked by a T helper (Th) cell-dependent B cell hyperresponsiveness, with frequent germinal center reactions, and gammaglobulinemia. A feature of SLE is the finding of IgG autoantibodies specific for dsDNA. The specificity of the Th cells that drive the expansion of anti-dsDNA B cells is unresolved. However, anti-microbial, anti-histone, and anti-idiotype Th cell responses have been hypothesized to play a role. It has been entirely unclear if these seemingly disparate Th cell responses and hypotheses could be related or unified. Here, we describe that H chain CDR3 idiotypes from IgG(+) B cells of lupus mice have sequence similarities with both microbial and self peptides. Matched sequences were more frequent within the mutated CDR3 repertoire and when sequences were derived from lupus mice with expanded anti-dsDNA B cells. Analyses of histone sequences showed that particular histone peptides were similar to VDJ junctions. Moreover, lupus mice had Th cell responses toward histone peptides similar to anti-dsDNA CDR3 sequences. The results suggest that Th cells in lupus may have multiple cross-reactive specificities linked to the IgVH CDR3 Id-peptide sequences as well as similar DNA-associated protein motifs.

  11. Dali server update.

    PubMed

    Holm, Liisa; Laakso, Laura M

    2016-07-08

    The Dali server (http://ekhidna2.biocenter.helsinki.fi/dali) is a network service for comparing protein structures in 3D. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The Dali server has been running in various places for over 20 years and is used routinely by crystallographers on newly solved structures. The latest update of the server provides enhanced analytics for the study of sequence and structure conservation. The server performs three types of structure comparisons: (i) Protein Data Bank (PDB) search compares one query structure against those in the PDB and returns a list of similar structures; (ii) pairwise comparison compares one query structure against a list of structures specified by the user; and (iii) all against all structure comparison returns a structural similarity matrix, a dendrogram and a multidimensional scaling projection of a set of structures specified by the user. Structural superimpositions are visualized using the Java-free WebGL viewer PV. The structural alignment view is enhanced by sequence similarity searches against Uniprot. The combined structure-sequence alignment information is compressed to a stack of aligned sequence logos. In the stack, each structure is structurally aligned to the query protein and represented by a sequence logo. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. A plasma membrane sucrose-binding protein that mediates sucrose uptake shares structural and sequence similarity with seed storage proteins but remains functionally distinct.

    PubMed

    Overvoorde, P J; Chao, W S; Grimes, H D

    1997-06-20

    Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.

  13. Identification of Protective Brucella Antigens and their Expressions in Vaccinia Virus to Prevent Disease in Animals and Humans.

    DTIC Science & Technology

    1996-05-01

    see figure appendix; B. abortus sequence in similarity arrangement with secD of E.coli, H. influenzae, M. leprae and S. coelicalor). Highly related...low similarity (E. coil and M. leprae approx 13.5% similarity). The Brucella and E. coil sequences were 25% similar (see figures appendix). In E...1987. Micobacterial growth inhibition by interferon-g activated bone marrow macrophages and differential susceptibility among strains of Mycobacterium

  14. Whole genome sequencing of Chinese clearhead icefish, Protosalanx hyalocranius.

    PubMed

    Liu, Kai; Xu, Dongpo; Li, Jia; Bian, Chao; Duan, Jinrong; Zhou, Yanfeng; Zhang, Minying; You, Xinxin; You, Yang; Chen, Jieming; Yu, Hui; Xu, Gangchun; Fang, Di-An; Qiang, Jun; Jiang, Shulun; He, Jie; Xu, Junmin; Shi, Qiong; Zhang, Zhiyong; Xu, Pao

    2017-04-01

    Chinese clearhead icefish, Protosalanx hyalocranius , is a representative icefish species with economic importance and special appearance. Due to its great economic value in China, the fish was introduced into Lake Dianchi and several other lakes from the Lake Taihu half a century ago. Similar to the Sinocyclocheilus cavefish, the clearhead icefish has certain cavefish-like traits, such as transparent body and nearly scaleless skin. Here, we provide the whole genome sequence of this surface-dwelling fish and generated a draft genome assembly, aiming at exploring molecular mechanisms for the biological interests. A total of 252.1 Gb of raw reads were sequenced. Subsequently, a novel draft genome assembly was generated, with the scaffold N50 reaching 1.163 Mb. The genome completeness was estimated to be 98.39 % by using the CEGMA evaluation. Finally, we annotated 19 884 protein-coding genes and observed that repeat sequences account for 24.43 % of the genome assembly. We report the first draft genome of the Chinese clearhead icefish. The genome assembly will provide a solid foundation for further molecular breeding and germplasm resource protection in Chinese clearhead icefish, as well as other icefishes. It is also a valuable genetic resource for revealing the molecular mechanisms for the cavefish-like characters. © The Authors 2017. Published by Oxford University Press.

  15. Implementation of fast macromolecular proton fraction mapping on 1.5 and 3 Tesla clinical MRI scanners: preliminary experience

    NASA Astrophysics Data System (ADS)

    Yarnykh, V.; Korostyshevskaya, A.

    2017-08-01

    Macromolecular proton fraction (MPF) is a biophysical parameter describing the amount of macromolecular protons involved into magnetization exchange with water protons in tissues. MPF represents a significant interest as a magnetic resonance imaging (MRI) biomarker of myelin for clinical applications. A recent fast MPF mapping method enabled clinical translation of MPF measurements due to time-efficient acquisition based on the single-point constrained fit algorithm. However, previous MPF mapping applications utilized only 3 Tesla MRI scanners and modified pulse sequences, which are not commonly available. This study aimed to test the feasibility of MPF mapping implementation on a 1.5 Tesla clinical scanner using standard manufacturer’s sequences and compare the performance of this method between 1.5 and 3 Tesla scanners. MPF mapping was implemented on 1.5 and 3 Tesla MRI units of one manufacturer with either optimized custom-written or standard product pulse sequences. Whole-brain three-dimensional MPF maps obtained from a single volunteer were compared between field strengths and implementation options. MPF maps demonstrated similar quality at both field strengths. MPF values in segmented brain tissues and specific anatomic regions appeared in close agreement. This experiment demonstrates the feasibility of fast MPF mapping using standard sequences on 1.5 T and 3 T clinical scanners.

  16. A comparison of multi-echo spin-echo and triple-echo steady-state T2 mapping for in vivo evaluation of articular cartilage.

    PubMed

    Juras, Vladimir; Bohndorf, Klaus; Heule, Rahel; Kronnerwetter, Claudia; Szomolanyi, Pavol; Hager, Benedikt; Bieri, Oliver; Zbyn, Stefan; Trattnig, Siegfried

    2016-06-01

    To assess the clinical relevance of T2 relaxation times, measured by 3D triple-echo steady-state (3D-TESS), in knee articular cartilage compared to conventional multi-echo spin-echo T2-mapping. Thirteen volunteers and ten patients with focal cartilage lesions were included in this prospective study. All subjects underwent 3-Tesla MRI consisting of a multi-echo multi-slice spin-echo sequence (CPMG) as a reference method for T2 mapping, and 3D TESS with the same geometry settings, but variable acquisition times: standard (TESSs 4:35min) and quick (TESSq 2:05min). T2 values were compared in six different regions in the femoral and tibial cartilage using a Wilcoxon signed ranks test and the Pearson correlation coefficient (r). The local ethics committee approved this study, and all participants gave written informed consent. The mean quantitative T2 values measured by CPMG (mean: 46±9ms) in volunteers were significantly higher compared to those measured with TESS (mean: 31±5ms) in all regions. Both methods performed similarly in patients, but CPMG provided a slightly higher difference between lesions and native cartilage (CPMG: 90ms→61ms [31%],p=0.0125;TESS 32ms→24ms [24%],p=0.0839). 3D-TESS provides results similar to those of a conventional multi-echo spin-echo sequence with many benefits, such as shortening of total acquisition time and insensitivity to B1 and B0 changes. • 3D-TESS T 2 mapping provides clinically comparable results to CPMG in shorter scan-time. • Clinical and investigational studies may benefit from high temporal resolution of 3D-TESS. • 3D-TESS T 2 values are able to differentiate between healthy and damaged cartilage.

  17. Correlating low-similarity peptide sequences and allergenic epitopes.

    PubMed

    Kanduc, D

    2008-01-01

    Although a high number of allergenic peptide epitopes has been experimentally identified and defined, the molecular basis and the precise mechanisms underlying peptide allergenicity are unknown. This issue was analyzed exploring the relationship between peptide allergenicity and sequence similarity to the human proteome. The structured analysis of the data reported in literature put into evidence that the most part of IgE-binding epitopes are (or harbor) pentapeptide unit(s) with no/low similarity to the human proteome, this way suggesting that no or low sequence similarity to the host proteome might represent a minimum common denominator identifying allergenic peptides. The present literature analysis might be of relevance in devising and designing short amino acid modules to be used for blocking pathogenic IgE.

  18. Biosequence Similarity Search on the Mercury System

    PubMed Central

    Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

    2007-01-01

    Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267

  19. EEG theta power and coherence to octave illusion in first-episode paranoid schizophrenia with auditory hallucinations.

    PubMed

    Zheng, Leilei; Chai, Hao; Yu, Shaohua; Xu, You; Chen, Wanzhen; Wang, Wei

    2015-01-01

    The exact mechanism behind auditory hallucinations in schizophrenia remains unknown. A corollary discharge dysfunction hypothesis has been put forward, but it requires further confirmation. Electroencephalography (EEG) of the Deutsch octave illusion might offer more insight, by demonstrating an abnormal cerebral activation similar to that under auditory hallucinations in schizophrenic patients. We invited 23 first-episode schizophrenic patients with auditory hallucinations and 23 healthy participants to listen to silence and two sound sequences, which consisted of alternating 400- and 800-Hz tones. EEG spectral power and coherence values of different frequency bands, including theta rhythm (3.5-7.5 Hz), were computed using 32 scalp electrodes. Task-related spectral power changes and task-related coherence differences were also calculated. Clinical characteristics of patients were rated using the Positive and Negative Syndrome Scale. After both sequences of octave illusion, the task-related theta power change values of frontal and temporal areas were significantly lower, and the task-related theta coherence difference values of intrahemispheric frontal-temporal areas were significantly higher in schizophrenic patients than in healthy participants. Moreover, the task-related power change values in both hemispheres were negatively correlated and the task-related coherence difference values in the right hemisphere were positively correlated with the hallucination score in schizophrenic patients. We only tested the Deutsch octave illusion in primary schizophrenic patients with acute first episode. Further studies might adopt other illusions or employ other forms of schizophrenia. Our results showed a lower activation but higher connection within frontal and temporal areas in schizophrenic patients under octave illusion. This suggests an oversynchronized but weak frontal area to exert an action to the ipsilateral temporal area, which supports the corollary discharge dysfunction hypothesis. © 2014 S. Karger AG, Basel.

  20. Domain similarity based orthology detection.

    PubMed

    Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich

    2015-05-13

    Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .

  1. Bacterial and archaeal phylogenetic diversity of a cold sulfur-rich spring on the shoreline of Lake Erie, Michigan

    USGS Publications Warehouse

    Chaudhary, A.; Haack, S.K.; Duris, J.W.; Marsh, T.L.

    2009-01-01

    Studies of sulfidic springs have provided new insights into microbial metabolism, groundwater biogeochemistry, and geologic processes. We investigated Great Sulphur Spring on the western shore of Lake Erie and evaluated the phylogenetic affiliations of 189 bacterial and 77 archaeal 16S rRNA gene sequences from three habitats: the spring origin (11-m depth), bacterial-algal mats on the spring pond surface, and whitish filamentous materials from the spring drain. Water from the spring origin water was cold, pH 6.3, and anoxic (H2, 5.4 nM; CH4, 2.70 ??M) with concentrations of S2- (0.03 mM), SO42- (14.8 mM), Ca2+ (15.7 mM), and HCO3- (4.1 mM) similar to those in groundwater from the local aquifer. No archaeal and few bacterial sequences were >95% similar to sequences of cultivated organisms. Bacterial sequences were largely affiliated with sulfur-metabolizing or chemolithotrophic taxa in Beta-, Gamma-, Delta-, and Epsilonproteobacteria. Epsilonproteobacteria sequences similar to those obtained from other sulfidic environments and a new clade of Cyanobacteria sequences were particularly abundant (16% and 40%, respectively) in the spring origin clone library. Crenarchaeota sequences associated with archaeal-bacterial consortia in whitish filaments at a German sulfidic spring were detected only in a similar habitat at Great Sulphur Spring. This study expands the geographic distribution of many uncultured Archaea and Bacteria sequences to the Laurentian Great Lakes, indicates possible roles for epsilonproteobacteria in local aquifer chemistry and karst formation, documents new oscillatorioid Cyanobacteria lineages, and shows that uncultured, cold-adapted Crenarchaeota sequences may comprise a significant part of the microbial community of some sulfidic environments. Copyright ?? 2009, American Society for Microbiology. All Rights Reserved.

  2. Genetic and evolutionary characterization of RABVs from China using the phosphoprotein gene.

    PubMed

    Wang, Lihua; Wu, Hui; Tao, Xiaoyan; Li, Hao; Rayner, Simon; Liang, Guodong; Tang, Qing

    2013-01-07

    While the function of the phosphoprotein (P) gene of the rabies virus (RABV) has been well studied in laboratory adapted RABVs, the genetic diversity and evolution characteristics of the P gene of street RABVs remain unclear. The objective of the present study was to investigate the mutation and evolution of P genes in Chinese street RABVs. The P gene of 77 RABVs from brain samples of dogs and wild animals collected in eight Chinese provinces through 2003 to 2008 were sequenced. The open reading frame (ORF) of the P genes was 894 nucleotides (nt) in length, with 85-99% (80-89%) amino acid (nucleotide) identity compared with the laboratory RABVs and vaccine strains. Phylogenetic analysis based on the P gene revealed that Chinese RABVs strains could be divided into two distinct clades, and several RABV variants were found to co circulating in the same province. Two conserved (CD1, 2) and two variable (VD1, 2) domains were identified by comparing the deduced primary sequences of the encoded P proteins. Two sequence motifs, one believed to confer binding to the cytoplasmic dynein light chain LC8 and a lysine-rich sequence were conserved throughout the Chinese RABVs. In contrast, the isolates exhibited lower conservation of one phosphate acceptor and one internal translation initiation site identified in the P protein of the rabies challenge virus standard (CVS) strain. Bayesian coalescent analysis showed that the P gene in Chinese RABVs have a substitution rate (3.305x10(-4) substitutions per site per year) and evolution history (592 years ago) similar to values for the glycoprotein (G) and nucleoprotein (N) reported previously. Several substitutions were found in the P gene of Chinese RABVs strains compared to the laboratory adapted and vaccine strains, whether these variations could affect the biological characteristics of Chinese RABVs need to be further investigated. The substitution rate and evolution history of P gene is similar to G and N gene, combine the topology of phylogenetic tree based on the P gene is similar to the G and N gene trees, indicate that the P, G and N genes are equally valid for examining the phylogenetics of RABVs.

  3. Spiribacter curvatus sp. nov., a moderately halophilic bacterium isolated from a saltern.

    PubMed

    León, María José; Rodríguez-Olmos, Angel; Sánchez-Porro, Cristina; López-Pérez, Mario; Rodríguez-Valera, Francisco; Soliveri, Juan; Ventosa, Antonio; Copa-Patiño, José Luis

    2015-12-01

    A novel pink-pigmented bacterial strain, UAH-SP71T, was isolated from a saltern located in Santa Pola, Alicante (Spain) and the complete genome sequence was analysed and compared with that of Spiribacter salinus M19-40T, suggesting that the two strains constituted two separate species, with a 77.3% ANI value. In this paper, strain UAH-SP71T was investigated in a taxonomic study using a polyphasic approach. Strain UAH-SP71T was a Gram-stain-negative, strictly aerobic, non-motile curved rod that grew in media containing 5-20% (w/v) NaCl (optimum 10% NaCl), at 5-40 °C (optimum 37 °C) and at pH 5-10 (optimum pH 8). Phylogenetic analysis based on the comparison of 16S rRNA gene sequences revealed thatstrain UAH-SP71T is a member of the genus Spiribacter, showing a sequence similarity of 96.5% with Spiribacter salinus M19-40T. Other related species are also members of the family Ectothiorhodospiraceae, including Arhodomonas recens RS91T (95.5% 16S rRNA gene sequence similarity), Arhodomonas aquaeolei ATCC 49307T (95.4 %) and Alkalilimnicola ehrlichii MLHE-1T (94.9 %). DNA-DNA hybridization between strain UAH-SP71T and Spiribacter salinus M19-40T was 39 %. The major cellular fatty acids of strain UAH-SP71T were C18 : 1ω6c and/or C18 : 1ω7c, C16 : 0, C16 : 1ω6c and/or C16 : 1ω7c, C10 : 0 3-OH and C12 : 0, a pattern similar to that of Spiribacter salinus M19-40T. Phylogenetic, phenotypic and genotypic differences between strain UAH-SP71T and Spiribacter salinus M19-40T indicate that strainUAH-SP71T represents a novel species of the genus Spiribacter, for which the name Spiribacter curvatus sp. nov. is proposed. The type strain is UAH-SP71T (5CECT8396T5DSM 28542T).

  4. Diversity of 16S rRNA genes of new Ehrlichia strains isolated from horses with clinical signs of Potomac horse fever.

    PubMed

    Wen, B; Rikihisa, Y; Fuerst, P A; Chaichanasiriwithaya, W

    1995-04-01

    Ehrlichia risticii is the causative agent of Potomac horse fever. Variations among the major antigens of different local E. risticii strains have been detected previously. To further assess genetic variability in this species or species complex, the sequences of the 16S rRNA genes of several isolates obtained from sick horses diagnosed as having Potomac horse fever were determined. The sequences of six isolates obtained from Ohio and three isolates obtained from Kentucky were amplified by PCR. Three groups of sequences were identified. The sequences of five of the Ohio isolates were identical to the sequence of the type strain of E. risticii, the Illinois strain. The sequence of one Ohio isolate, isolate 081, was unique; this sequence differed in 10 nucleotides from the sequence of the type strain (level of similarity, 99.3%). The sequences of the three Kentucky isolates were identical to each other, but differed by five bases from the sequence of the type strain (level of similarity, 99.6%). The levels of sequence similarity of isolate 081, the Kentucky isolates, and the type strain to the next most closely related Ehrlichia sp., Ehrlichia sennetsu, were 99.3, 99.2, and 99.2%, respectively. On the basis of the distinct antigenic profiles and the levels of 16S rRNA sequence divergence, isolate 081 is as divergent from the type strain of E. risticii as E. sennetsu is. Therefore, we suggest that strain 081 and the Kentucky isolates may represent two new distinct Ehrlichia species.

  5. Quantifying the Relationships among Drug Classes

    PubMed Central

    Hert, Jérôme; Keiser, Michael J.; Irwin, John J.; Oprea, Tudor I.; Shoichet, Brian K.

    2009-01-01

    The similarity of drug targets is typically measured using sequence or structural information. Here, we consider chemo-centric approaches that measure target similarity on the basis of their ligands, asking how chemoinformatics similarities differ from those derived bioinformatically, how stable the ligand networks are to changes in chemoinformatics metrics, and which network is the most reliable for prediction of pharmacology. We calculated the similarities between hundreds of drug targets and their ligands and mapped the relationship between them in a formal network. Bioinformatics networks were based on the BLAST similarity between sequences, while chemoinformatics networks were based on the ligand-set similarities calculated with either the Similarity Ensemble Approach (SEA) or a method derived from Bayesian statistics. By multiple criteria, bioinformatics and chemoinformatics networks differed substantially, and only occasionally did a high sequence similarity correspond to a high ligand-set similarity. In contrast, the chemoinformatics networks were stable to the method used to calculate the ligand-set similarities and to the chemical representation of the ligands. Also, the chemoinformatics networks were more natural and more organized, by network theory, than their bioinformatics counterparts: ligand-based networks were found to be small-world and broad-scale. PMID:18335977

  6. Added Value of Assessing Adnexal Masses with Advanced MRI Techniques

    PubMed Central

    Thomassin-Naggara, I.; Balvay, D.; Rockall, A.; Carette, M. F.; Ballester, M.; Darai, E.; Bazot, M.

    2015-01-01

    This review will present the added value of perfusion and diffusion MR sequences to characterize adnexal masses. These two functional MR techniques are readily available in routine clinical practice. We will describe the acquisition parameters and a method of analysis to optimize their added value compared with conventional images. We will then propose a model of interpretation that combines the anatomical and morphological information from conventional MRI sequences with the functional information provided by perfusion and diffusion weighted sequences. PMID:26413542

  7. A galaxy of folds.

    PubMed

    Alva, Vikram; Remmert, Michael; Biegert, Andreas; Lupas, Andrei N; Söding, Johannes

    2010-01-01

    Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.

  8. Transcriptome Analysis of Houttuynia cordata Thunb. by Illumina Paired-End RNA Sequencing and SSR Marker Discovery

    PubMed Central

    Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

    2014-01-01

    Background Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Principal Findings Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10−5), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. Conclusions This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus. PMID:24392108

  9. Unenhanced breast MRI (STIR, T2-weighted TSE, DWIBS): An accurate and alternative strategy for detecting and differentiating breast lesions.

    PubMed

    Telegrafo, Michele; Rella, Leonarda; Stabile Ianora, Amato Antonio; Angelelli, Giuseppe; Moschetta, Marco

    2015-10-01

    To assess the role of STIR, T2-weighted TSE and DWIBS sequences for detecting and characterizing breast lesions and to compare unenhanced (UE)-MRI results with contrast-enhanced (CE)-MRI and histological findings, having the latter as the reference standard. Two hundred eighty consecutive patients (age range, 27-73 years; mean age±standard deviation (SD), 48.8±9.8years) underwent MR examination with a diagnostic protocol including STIR, T2-weighted TSE, THRIVE and DWIBS sequences. Two radiologists blinded to both dynamic sequences and histological findings evaluated in consensus STIR, T2-weighted TSE and DWIBS sequences and after two weeks CE-MRI images searching for breast lesions. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy for UE-MRI and CE-MRI were calculated. UE-MRI results were also compared with CE- MRI. UE-MRI sequences obtained sensitivity, specificity, diagnostic accuracy, PPV and NPV values of 94%, 79%, 86%, 79% and 94%, respectively. CE-MRI sequences obtained sensitivity, specificity, diagnostic accuracy, PPV and NPV values of 98%, 83%, 90%, 84% and 98%, respectively. No statistically significant difference between UE-MRI and CE-MRI was found. Breast UE-MRI could represent an accurate diagnostic tool and a valid alternative to CE-MRI for evaluating breast lesions. STIR and DWIBS sequences allow to detect breast lesions while T2-weighted TSE sequences and ADC values could be useful for lesion characterization. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Methylobacterium pseudosasae sp. nov., a pink-pigmented, facultatively methylotrophic bacterium isolated from the bamboo phyllosphere.

    PubMed

    Madhaiyan, Munusamy; Poonguzhali, Selvaraj

    2014-02-01

    A pink-pigmented, Gram negative, aerobic, facultatively methylotrophic bacterium, strain BL44(T), was isolated from bamboo leaves and identified as a member of the genus Methylobacterium. Phylogenetic analysis based on 16S rRNA gene sequences showed similarity values of 98.7-97.0 % with closely related type strains and showed highest similarity to Methylobacterium zatmanii DSM 5688(T) (98.7 %) and Methylobacterium thiocyanatum DSM 11490(T) (98.7 %). Methylotrophic metabolism in this strain was confirmed by PCR amplification and sequencing of the mxaF gene coding for the α-subunit of methanol dehydrogenase. Strain BL44(T) produced three known quorum sensing signal molecules with similar retention time to C8, C10 and C12-HSLs when characterized by GC-MS. The fatty acid profiles contained major amounts of C18:1 ω7c, iso-3OH C17:0 and summed feature 3 (C16:1 ω7c and/or iso-C15:0 2-OH), which supported the grouping of the isolate in the genus Methylobacterium. The DNA G+C content was 66.9 mol%. DNA relatedness of the strain BL44(T) to its most closely related strains ranged from 12-43.3 %. On the basis of the phenotypic, phylogenetic and DNA-DNA hybridization data, strain BL44(T) is assigned to a novel species of the genus Methylobacterium for which the name Methylobacterium pseudosasae sp. nov. is proposed (type strain BL44(T) = NBRC 105205(T) = ICMP 17622(T)).

  11. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971

  12. Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

    PubMed Central

    2011-01-01

    Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247

  13. Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence

    NASA Astrophysics Data System (ADS)

    Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.

    2018-03-01

    The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.

  14. On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

    PubMed Central

    2014-01-01

    Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864

  15. Characterization of minimal sequences associated with self-similar interval exchange maps

    NASA Astrophysics Data System (ADS)

    Cobo, Milton; Gutiérrez-Romo, Rodolfo; Maass, Alejandro

    2018-04-01

    The construction of affine interval exchange maps (IEMs) with wandering intervals that are semi-conjugate to a given self-similar IEM is strongly related to the existence of the so-called minimal sequences associated with local potentials, which are certain elements of the substitution subshift arising from the given IEM. In this article, under the condition called unique representation property, we characterize such minimal sequences for potentials coming from non-real eigenvalues of the substitution matrix. We also give conditions on the slopes of the affine extensions of a self-similar IEM that determine whether it exhibits a wandering interval or not.

  16. Shot sequencing based on biological equivalent dose considerations for multiple isocenter Gamma Knife radiosurgery

    NASA Astrophysics Data System (ADS)

    Ma, Lijun; Lee, Letitia; Barani, Igor; Hwang, Andrew; Fogh, Shannon; Nakamura, Jean; McDermott, Michael; Sneed, Penny; Larson, David A.; Sahgal, Arjun

    2011-11-01

    Rapid delivery of multiple shots or isocenters is one of the hallmarks of Gamma Knife radiosurgery. In this study, we investigated whether the temporal order of shots delivered with Gamma Knife Perfexion would significantly influence the biological equivalent dose for complex multi-isocenter treatments. Twenty single-target cases were selected for analysis. For each case, 3D dose matrices of individual shots were extracted and single-fraction equivalent uniform dose (sEUD) values were determined for all possible shot delivery sequences, corresponding to different patterns of temporal dose delivery within the target. We found significant variations in the sEUD values among these sequences exceeding 15% for certain cases. However, the sequences for the actual treatment delivery were found to agree (<3%) and to correlate (R2 = 0.98) excellently with the sequences yielding the maximum sEUD values for all studied cases. This result is applicable for both fast and slow growing tumors with α/β values of 2 to 20 according to the linear-quadratic model. In conclusion, despite large potential variations in different shot sequences for multi-isocenter Gamma Knife treatments, current clinical delivery sequences exhibited consistent biological target dosing that approached that maximally achievable for all studied cases.

  17. Thermodynamic characterization of tandem mismatches found in naturally occurring RNA

    PubMed Central

    Christiansen, Martha E.; Znosko, Brent M.

    2009-01-01

    Although all sequence symmetric tandem mismatches and some sequence asymmetric tandem mismatches have been thermodynamically characterized and a model has been proposed to predict the stability of previously unmeasured sequence asymmetric tandem mismatches [Christiansen,M.E. and Znosko,B.M. (2008) Biochemistry, 47, 4329–4336], experimental thermodynamic data for frequently occurring tandem mismatches is lacking. Since experimental data is preferred over a predictive model, the thermodynamic parameters for 25 frequently occurring tandem mismatches were determined. These new experimental values, on average, are 1.0 kcal/mol different from the values predicted for these mismatches using the previous model. The data for the sequence asymmetric tandem mismatches reported here were then combined with the data for 72 sequence asymmetric tandem mismatches that were published previously, and the parameters used to predict the thermodynamics of previously unmeasured sequence asymmetric tandem mismatches were updated. The average absolute difference between the measured values and the values predicted using these updated parameters is 0.5 kcal/mol. This updated model improves the prediction for tandem mismatches that were predicted rather poorly by the previous model. This new experimental data and updated predictive model allow for more accurate calculations of the free energy of RNA duplexes containing tandem mismatches, and, furthermore, should allow for improved prediction of secondary structure from sequence. PMID:19509311

  18. Genome Sequences of Akhmeta Virus, an Early Divergent Old World Orthopoxvirus.

    PubMed

    Gao, Jinxin; Gigante, Crystal; Khmaladze, Ekaterine; Liu, Pengbo; Tang, Shiyuyun; Wilkins, Kimberly; Zhao, Kun; Davidson, Whitni; Nakazawa, Yoshinori; Maghlakelidze, Giorgi; Geleishvili, Marika; Kokhreidze, Maka; Carroll, Darin S; Emerson, Ginny; Li, Yu

    2018-05-12

    Annotated whole genome sequences of three isolates of the Akhmeta virus (AKMV), a novel species of orthopoxvirus (OPXV), isolated from the Akhmeta and Vani regions of the country Georgia, are presented and discussed. The AKMV genome is similar in genomic content and structure to that of the cowpox virus (CPXV), but a lower sequence identity was found between AKMV and Old World OPXVs than between other known species of Old World OPXVs. Phylogenetic analysis showed that AKMV diverged prior to other Old World OPXV. AKMV isolates formed a monophyletic clade in the OPXV phylogeny, yet the sequence variability between AKMV isolates was higher than between the monkeypox virus strains in the Congo basin and West Africa. An AKMV isolate from Vani contained approximately six kb sequence in the left terminal region that shared a higher similarity with CPXV than with other AKMV isolates, whereas the rest of the genome was most similar to AKMV, suggesting recombination between AKMV and CPXV in a region containing several host range and virulence genes.

  19. Phonotactics, Neighborhood Activation, and Lexical Access for Spoken Words

    PubMed Central

    Vitevitch, Michael S.; Luce, Paul A.; Pisoni, David B.; Auer, Edward T.

    2012-01-01

    Probabilistic phonotactics refers to the relative frequencies of segments and sequences of segments in spoken words. Neighborhood density refers to the number of words that are phonologically similar to a given word. Despite a positive correlation between phonotactic probability and neighborhood density, nonsense words with high probability segments and sequences are responded to more quickly than nonsense words with low probability segments and sequences, whereas real words occurring in dense similarity neighborhoods are responded to more slowly than real words occurring in sparse similarity neighborhoods. This contradiction may be resolved by hypothesizing that effects of probabilistic phonotactics have a sublexical focus and that effects of similarity neighborhood density have a lexical focus. The implications of this hypothesis for models of spoken word recognition are discussed. PMID:10433774

  20. Protein-protein interaction network-based detection of functionally similar proteins within species.

    PubMed

    Song, Baoxing; Wang, Fen; Guo, Yang; Sang, Qing; Liu, Min; Li, Dengyun; Fang, Wei; Zhang, Deli

    2012-07-01

    Although functionally similar proteins across species have been widely studied, functionally similar proteins within species showing low sequence similarity have not been examined in detail. Identification of these proteins is of significant importance for understanding biological functions, evolution of protein families, progression of co-evolution, and convergent evolution and others which cannot be obtained by detection of functionally similar proteins across species. Here, we explored a method of detecting functionally similar proteins within species based on graph theory. After denoting protein-protein interaction networks using graphs, we split the graphs into subgraphs using the 1-hop method. Proteins with functional similarities in a species were detected using a method of modified shortest path to compare these subgraphs and to find the eligible optimal results. Using seven protein-protein interaction networks and this method, some functionally similar proteins with low sequence similarity that cannot detected by sequence alignment were identified. By analyzing the results, we found that, sometimes, it is difficult to separate homologous from convergent evolution. Evaluation of the performance of our method by gene ontology term overlap showed that the precision of our method was excellent. Copyright © 2012 Wiley Periodicals, Inc.

  1. Fibonacci chain polynomials: Identities from self-similarity

    NASA Technical Reports Server (NTRS)

    Lang, Wolfdieter

    1995-01-01

    Fibonacci chains are special diatomic, harmonic chains with uniform nearest neighbor interaction and two kinds of atoms (mass-ratio r) arranged according to the self-similar binary Fibonacci sequence ABAABABA..., which is obtained by repeated substitution of A yields AB and B yields A. The implications of the self-similarity of this sequence for the associated orthogonal polynomial systems which govern these Fibonacci chains with fixed mass-ratio r are studied.

  2. Characterization of kinetoplast DNA from Phytomonas serpens.

    PubMed

    Sá-Carvalho, D; Perez-Morga, D; Traub-Cseko, Y M

    1993-01-01

    The restriction enzyme digestion of kinetoplast DNA from four Phytomonas serpens isolates shows an overall similar band pattern. One minicircle from isolate 30T was cloned and sequenced, showing low levels of homology but the same general features and organization as described for minicircles of other trypanosomatids. Extensive regions of the minicircle are composed by G and T on the H strand. These regions are very repetitive and similar to regions in a minicircle of Crithidia oncopelti and to telomeric sequences of Saccharomyces cerevisiae. Conserved Sequence Block 3, present in all trypanosomatids, is one nucleotide different from the consensus in P. serpens and provides a basis to differentiate P. serpens from other trypanosomatids. Electron microscopy of kinetoplast DNA evidenced a network with organization similar to other trypanosomatids and the measurement of minicircles confirmed the size of about 1.45 kb of the sequenced minicircle.

  3. Color-color diagrams in near infrared: (J-H)/(H-K). I

    NASA Astrophysics Data System (ADS)

    Gyulbudaghian, Armen L.; Baloian, N.; Sanchez, I. A.

    2017-12-01

    In the paper are presented the color-color diagrams (J-H)/(H-K) for all stars with visible values B<11, for which in the known catalogs the values of J, H, K, and also spectral classes and luminosity classes of these stars are given. The diagrams are constructed for luminosity classes Ia, Ib, II, III, IV, V. The similarity of diagrams for classes Ia and Ib (super giants) and II (giants), is obvious from these diagrams. The diagrams obtained by us can be used for discovering of new young stars and also for determining of color excesses of investigating stars. Maximal amounts of stars are registered in the classes V and III. There is a tendency of increasing of J-H and H-K along the sequence of spectral classes O - M, which is correct for all luminosity classes.

  4. Confirmation of Two Sibling Species among Anopheles fluviatilis Mosquitoes in South and Southeastern Iran by Analysis of Cytochrome Oxidase I Gene.

    PubMed

    Naddaf, Saied Reza; Oshaghi, Mohammad Ali; Vatandoost, Hassan

    2012-12-01

    Anopheles fluviatilis, one of the major malaria vectors in Iran, is assumed to be a complex of sibling species. The aim of this study was to evaluate Cytochrome oxidase I (COI) gene alongside 28S-D3 as a diagnostic tool for identification of An. fluviatilis sibling species in Iran. DNA sample belonging to 24 An. fluviatilis mosquitoes from different geographical areas in south and southeastern Iran were used for amplification of COI gene followed by sequencing. The 474-475 bp COI sequences obtained in this study were aligned with 59 similar sequences of An. fluviatilis and a sequence of Anopheles minimus, as out group, from GenBank database. The distances between group and individual sequences were calculated and phylogenetic tree for obtained sequences was generated by using Kimura two parameter (K2P) model of neighbor-joining method. Phylogenetic analysis using COI gene grouped members of Fars Province (central Iran) in two distinct clades separate from other Iranian members representing Hormozgan, Kerman, and Sistan va Baluchestan Provinces. The mean distance between Iranian and Indian individuals was 1.66%, whereas the value between Fars Province individuals and the group comprising individuals from other areas of Iran was 2.06%. Presence of 2.06% mean distance between individuals from Fars Province and those from other areas of Iran is indicative of at least two sibling species in An. fluviatilis mosquitoes of Iran. This finding confirms earlier results based on RAPD-PCR and 28S-D3 analysis.

  5. Re-examination of population structure and phylogeography of hawksbill turtles in the wider Caribbean using longer mtDNA sequences.

    PubMed

    Leroux, Robin A; Dutton, Peter H; Abreu-Grobois, F Alberto; Lagueux, Cynthia J; Campbell, Cathi L; Delcroix, Eric; Chevalier, Johan; Horrocks, Julia A; Hillis-Starr, Zandy; Troëng, Sebastian; Harrison, Emma; Stapleton, Seth

    2012-01-01

    Management of the critically endangered hawksbill turtle in the Wider Caribbean (WC) has been hampered by knowledge gaps regarding stock structure. We carried out a comprehensive stock structure re-assessment of 11 WC hawksbill rookeries using longer mtDNA sequences, larger sample sizes (N = 647), and additional rookeries compared to previous surveys. Additional variation detected by 740 bp sequences between populations allowed us to differentiate populations such as Barbados-Windward and Guadeloupe (F (st) = 0.683, P < 0.05) that appeared genetically indistinguishable based on shorter 380 bp sequences. POWSIM analysis showed that longer sequences improved power to detect population structure and that when N < 30, increasing the variation detected was as effective in increasing power as increasing sample size. Geographic patterns of genetic variation suggest a model of periodic long-distance colonization coupled with region-wide dispersal and subsequent secondary contact within the WC. Mismatch analysis results for individual clades suggest a general population expansion in the WC following a historic bottleneck about 100 000-300 000 years ago. We estimated an effective female population size (N (ef)) of 6000-9000 for the WC, similar to the current estimated numbers of breeding females, highlighting the importance of these regional rookeries to maintaining genetic diversity in hawksbills. Our results provide a basis for standardizing future work to 740 bp sequence reads and establish a more complete baseline for determining stock boundaries in this migratory marine species. Finally, our findings illustrate the value of maintaining an archive of specimens for re-analysis as new markers become available.

  6. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    PubMed

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  7. Investigation of SnSPR1, a novel and abundant surface protein of Sarcocystis neurona merozoites.

    PubMed

    Zhang, Deqing; Howe, Daniel K

    2008-04-15

    An expressed sequence tag (EST) sequencing project has produced over 15,000 partial cDNA sequences from the equine pathogen Sarcocystis neurona. While many of the sequences are clear homologues of previously characterized genes, a significant number of the S. neurona ESTs do not exhibit similarity to anything in the extensive sequence databases that have been generated. In an effort to characterize parasite proteins that are novel to S. neurona, a seemingly unique gene was selected for further investigation based on its abundant representation in the collection of ESTs and the predicted presence of a signal peptide and glycolipid anchor addition on the encoded protein. The gene was expressed in E. coli, and monospecific polyclonal antiserum against the recombinant protein was produced by immunization of a rabbit. Characterization of the native protein in S. neurona merozoites and schizonts revealed that it is a low molecular weight surface protein that is expressed throughout intracellular development of the parasite. The protein was designated Surface Protein 1 (SPR1) to reflect its display on the outer surface of merozoites and to distinguish it from the ubiquitous SAG/SRS surface antigens of the heteroxenous Coccidia. Interestingly, infection assays in the presence of the polyclonal antiserum suggested that SnSPR1 plays some role in attachment and/or invasion of host cells by S. neurona merozoites. The work described herein represents a general template for selecting and characterizing the various unidentified gene sequences that are plentiful in the EST databases for S. neurona and other apicomplexans. Furthermore, this study illustrates the value of investigating these novel sequences since it can offer new candidates for diagnostic or vaccine development while also providing greater insight into the biology of these parasites.

  8. The role of heterologous chloroplast sequence elements in transgene integration and expression.

    PubMed

    Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

    2010-04-01

    Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5' untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5' UTR and 3' UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5' UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5' UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation.

  9. The Role of Heterologous Chloroplast Sequence Elements in Transgene Integration and Expression1[W][OA

    PubMed Central

    Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

    2010-01-01

    Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5′ untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5′ UTR and 3′ UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5′ UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5′ UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation. PMID:20130101

  10. De novo sequencing analysis of the Rosa roxburghii fruit transcriptome reveals putative ascorbate biosynthetic genes and EST-SSR markers.

    PubMed

    Yan, Xiuqin; Zhang, Xue; Lu, Min; He, Yong; An, Huaming

    2015-04-25

    Rosa roxburghii Tratt. is a well-known ornamental rose species native to China. In addition, the fruits of this species are valued for their nutritional and medicinal characteristics, especially their high ascorbic acid (AsA) levels. Nevertheless, AsA biosynthesis in R. roxburghii fruit has not been explored in detail because of a lack of genomic resources for this species. High-throughput transcriptomic sequencing generating large volumes of transcript sequence data can aid in gene discovery and molecular marker development. In this study, we generated more than 53 million clean reads using Illumina paired-end sequencing technology. De novo assembly yielded 106,590 unigenes, with an average length of 343 bp. On the basis of sequence similarity to known proteins, 9301 and 2393 unigenes were classified into Gene Ontology and Clusters of Orthologous Group categories, respectively. There were 7480 unigenes assigned to 124 pathways in the Kyoto Encyclopedia of Gene and Genome pathway database. BLASTx searches identified 498 unique putative transcripts encoding various transcription factors, some known to regulate fruit development. qRT-PCR validated the expressions of most of the genes encoding the main enzymes involved in ascorbate biosynthesis. In addition, 9131 potential simple sequence repeat (SSR) loci were identified among the unigenes. One hundred and two primer pairs were synthesized and 71 pairs produced an amplification product during initial screening. Among the amplified products, 30 were polymorphic in the 16 R. roxburghii germplasms tested. Our study was the first to produce a large volume of transcriptome data from R. roxburghii. The resulting sequence collection is a valuable resource for gene discovery and marker-assisted selective breeding in this rose species. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Analysis of HIV-1 intersubtype recombination breakpoints suggests region with high pairing probability may be a more fundamental factor than sequence similarity affecting HIV-1 recombination.

    PubMed

    Jia, Lei; Li, Lin; Gui, Tao; Liu, Siyang; Li, Hanping; Han, Jingwan; Guo, Wei; Liu, Yongjian; Li, Jingyun

    2016-09-21

    With increasing data on HIV-1, a more relevant molecular model describing mechanism details of HIV-1 genetic recombination usually requires upgrades. Currently an incomplete structural understanding of the copy choice mechanism along with several other issues in the field that lack elucidation led us to perform an analysis of the correlation between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarity to further explore structural mechanisms. Near full length sequences of URFs from Asia, Europe, and Africa (one sequence/patient), and representative sequences of worldwide CRFs were retrieved from the Los Alamos HIV database. Their recombination patterns were analyzed by jpHMM in detail. Then the relationships between breakpoint distributions and (1) the probability of base pairing, and (2) intersubtype genetic similarities were investigated. Pearson correlation test showed that all URF groups and the CRF group exhibit the same breakpoint distribution pattern. Additionally, the Wilcoxon two-sample test indicated a significant and inexplicable limitation of recombination in regions with high pairing probability. These regions have been found to be strongly conserved across distinct biological states (i.e., strong intersubtype similarity), and genetic similarity has been determined to be a very important factor promoting recombination. Thus, the results revealed an unexpected disagreement between intersubtype similarity and breakpoint distribution, which were further confirmed by genetic similarity analysis. Our analysis reveals a critical conflict between results from natural HIV-1 isolates and those from HIV-1-based assay vectors in which genetic similarity has been shown to be a very critical factor promoting recombination. These results indicate the region with high-pairing probabilities may be a more fundamental factor affecting HIV-1 recombination than sequence similarity in natural HIV-1 infections. Our findings will be relevant in furthering the understanding of HIV-1 recombination mechanisms.

  12. Comprehensive comparison of three commercial human whole-exome capture platforms.

    PubMed

    Asan; Xu, Yu; Jiang, Hui; Tyler-Smith, Chris; Xue, Yali; Jiang, Tao; Wang, Jiawei; Wu, Mingzhi; Liu, Xiao; Tian, Geng; Wang, Jun; Wang, Jian; Yang, Huangming; Zhang, Xiuqing

    2011-09-28

    Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study. We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias. We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.

  13. Two new miniature inverted-repeat transposable elements in the genome of the clam Donax trunculus.

    PubMed

    Šatović, Eva; Plohl, Miroslav

    2017-10-01

    Repetitive sequences are important components of eukaryotic genomes that drive their evolution. Among them are different types of mobile elements that share the ability to spread throughout the genome and form interspersed repeats. To broaden the generally scarce knowledge on bivalves at the genome level, in the clam Donax trunculus we described two new non-autonomous DNA transposons, miniature inverted-repeat transposable elements (MITEs), named DTC M1 and DTC M2. Like other MITEs, they are characterized by their small size, their A + T richness, and the presence of terminal inverted repeats (TIRs). DTC M1 and DTC M2 are 261 and 286 bp long, respectively, and in addition to TIRs, both of them contain a long imperfect palindrome sequence in their central parts. These elements are present in complete and truncated versions within the genome of the clam D. trunculus. The two new MITEs share only structural similarity, but lack any nucleotide sequence similarity to each other. In a search for related elements in databases, blast search revealed within the Crassostrea gigas genome a larger element sharing sequence similarity only to DTC M1 in its TIR sequences. The lack of sequence similarity with any previously published mobile elements indicates that DTC M1 and DTC M2 elements may be unique to D. trunculus.

  14. Diversity of indoor fungi as revealed by DNA metabarcoding.

    PubMed

    Korpelainen, Helena; Pietiläinen, Maria

    2017-01-01

    In the present study, we conducted DNA metabarcoding (the nuclear ITS2 region) for indoor fungal samples originating from two nursery schools with a suspected mould problem (sampling before and after renovation), from two university buildings, and from an old farmhouse. Good-quality sequences were obtained, and the results showed that DNA metabarcoding provides high resolution in fungal identification. The pooled proportions of sequences representing filamentous ascomycetes, filamentous basidiomycetes, yeasts, and other fungi equalled 62.3%, 8.0%, 28.3%, and 1.4%, respectively, and the total number of fungal genera found during the study was 585. When comparing fungal diversities and taxonomic composition between different types of buildings, no obvious pattern was detected. The average pairwise values of Sørensen Chao indices that were used to compare similarities for taxon composition between samples among the samples from the two university buildings, two nurseries, and farmhouse equaled 0.693, 0.736, 0.852, 0.928, and 0.981, respectively, while the mean similarity index for all samples was 0.864. We discovered that making explicit conclusions on the relationship between the indoor air quality and mycoflora is complicated by the lack of appropriate indicators for air quality and by the occurrence of wide spatial and temporal changes in diversity and compositions among samples.

  15. Structures of a bi-functional Kunitz-type STI family inhibitor of serine and aspartic proteases: Could the aspartic protease inhibition have evolved from a canonical serine protease-binding loop?

    PubMed

    Guerra, Yasel; Valiente, Pedro A; Pons, Tirso; Berry, Colin; Rudiño-Piñera, Enrique

    2016-08-01

    Bi-functional inhibitors from the Kunitz-type soybean trypsin inhibitor (STI) family are glycosylated proteins able to inhibit serine and aspartic proteases. Here we report six crystal structures of the wild-type and a non-glycosylated mutant of the bifunctional inhibitor E3Ad obtained at different pH values and space groups. The crystal structures show that E3Ad adopts the typical β-trefoil fold of the STI family exhibiting some conformational changes due to pH variations and crystal packing. Despite the high sequence identity with a recently reported potato cathepsin D inhibitor (PDI), three-dimensional structures obtained in this work show a significant conformational change in the protease-binding loop proposed for aspartic protease inhibition. The E3Ad binding loop for serine protease inhibition is also proposed, based on structural similarity with a novel non-canonical conformation described for the double-headed inhibitor API-A from the Kunitz-type STI family. In addition, structural and sequence analyses suggest that bifunctional inhibitors of serine and aspartic proteases from the Kunitz-type STI family are more similar to double-headed inhibitor API-A than other inhibitors with a canonical protease-binding loop. Copyright © 2016. Published by Elsevier Inc.

  16. Allergen cross reactions: a problem greater than ever thought?

    PubMed

    Pfiffner, P; Truffer, R; Matsson, P; Rasi, C; Mari, A; Stadler, B M

    2010-12-01

    Cross reactions are an often observed phenomenon in patients with allergy. Sensitization against some allergens may cause reactions against other seemingly unrelated allergens. Today, cross reactions are being investigated on a per-case basis, analyzing blood serum specific IgE (sIgE) levels and clinical features of patients suffering from cross reactions. In this study, we evaluated the level of sIgE compared to patients' total IgE assuming epitope specificity is a consequence of sequence similarity. Our objective was to evaluate our recently published model of molecular sequence similarities underlying cross reactivity using serum-derived data from IgE determinations of standard laboratory tests. We calculated the probabilities of protein cross reactivity based on conserved sequence motifs and compared these in silico predictions to a database consisting of 5362 sera with sIgE determinations. Cumulating sIgE values of a patient resulted in a median of 25-30% total IgE. Comparing motif cross reactivity predictions to sIgE levels showed that on average three times fewer motifs than extracts were recognized in a given serum (correlation coefficient: 0.967). Extracts belonging to the same motif group co-reacted in a high percentage of sera (up to 80% for some motifs). Cumulated sIgE levels are exaggerated because of a high level of observed cross reactions. Thus, not only bioinformatic prediction of allergenic motifs, but also serological routine testing of allergic patients implies that the immune system may recognize only a small number of allergenic structures. © 2010 John Wiley & Sons A/S.

  17. Bacillus horneckiae sp. nov., isolated from a spacecraft-assembly clean room.

    PubMed

    Vaishampayan, Parag; Probst, Alexander; Krishnamurthi, Srinivasan; Ghosh, Sudeshna; Osman, Shariff; McDowall, Alasdair; Ruckmani, Arunachalam; Mayilraj, Shanmugam; Venkateswaran, Kasthuri

    2010-05-01

    Five Gram-stain-positive, motile, aerobic strains were isolated from a clean room of the Kennedy Space Center where the Phoenix spacecraft was assembled. All strains are rod-shaped, spore-forming bacteria, whose spores were resistant to UV radiation up to 1000 J m(-2). The spores were subterminally positioned and produced an external layer. A polyphasic taxonomic study including traditional biochemical tests, fatty acid analysis, cell-wall typing, lipid analyses, 16S rRNA gene sequencing and DNA-DNA hybridization studies was performed to characterize these novel strains. 16S rRNA gene sequencing and lipid analyses convincingly grouped these novel strains within the genus Bacillus as a cluster separate from already described species. The similarity of 16S rRNA gene sequences among the novel strains was >99 %, but the similarity was only about 97 % with their nearest neighbours Bacillus pocheonensis, Bacillus firmus and Bacillus bataviensis. DNA-DNA hybridization dissociation values were <24 % to the closest related type strains. The novel strains had a G+C content 35.6+/-0.5 mol% and could liquefy gelatin but did not utilize or produce acids from any of the carbon substrates tested. The major fatty acids were iso-C(15 : 0) and anteiso-C(15 : 0) and the cell-wall diamino acid was meso-diaminopimelic acid. Based on phylogenetic and phenotypic results, it is concluded that these strains represent a novel species of the genus Bacillus, for which the name Bacillus horneckiae sp. nov. is proposed. The type strain is 1P01SC(T) (=NRRL B-59162(T) =MTCC 9535(T)).

  18. Trichococcus patagoniensis sp. nov., a facultative anaerobe that grows at -5 degrees C, isolated from penguin guano in Chilean Patagonia.

    PubMed

    Pikuta, Elena V; Hoover, Richard B; Bej, Asim K; Marsic, Damien; Whitman, William B; Krader, Paul E; Tang, Jane

    2006-09-01

    A novel, extremely psychrotolerant, facultative anaerobe, strain PmagG1(T), was isolated from guano of Magellanic penguins (Spheniscus magellanicus) collected in Chilean Patagonia. Gram-variable, motile cocci with a diameter of 1.3-2.0 mum were observed singularly or in pairs, short chains and irregular conglomerates. Growth occurred within the pH range 6.0-10.0, with optimum growth at pH 8.5. The temperature range for growth of the novel isolate was from -5 to 35 degrees C, with optimum growth at 28-30 degrees C. Strain PmagG1(T) did not require NaCl, as growth was observed in the presence of 0-6.5 % NaCl with optimum growth at 0.5 % (w/v). Strain PmagG1(T) was a catalase-negative chemo-organoheterotroph that used sugars and some organic acids as substrates. The metabolic end products were lactate, formate, acetate, ethanol and CO(2). Strain PmagG1(T) was sensitive to ampicillin, tetracycline, chloramphenicol, rifampicin, kanamycin and gentamicin. The G+C content of its genomic DNA was 45.8 mol%. 16S rRNA gene sequence analysis showed 100 % similarity of strain PmagG1(T) with Trichococcus collinsii ATCC BAA-296(T), but DNA-DNA hybridization between them demonstrated relatedness values of <45+/-1 %. Another phylogenetically closely related species, Trichococcus pasteurii, showed 99.85 % similarity by 16S rRNA sequencing and DNA-DNA hybridization showed relatedness values of 47+/-1.5 %. Based on genotypic and phenotypic characteristics, the novel species Trichococcus patagoniensis sp. nov. is proposed, with strain PmagG1(T) (=ATCC BAA-756(T)=JCM 12176(T)=CIP 108035(T)) as the type strain.

  19. Trichococcus Patagoniensis sp. nov., a Facultative Anaerobe that grows at -5 C, Isolated from Penguin Guano in Chilean Patagonia

    NASA Technical Reports Server (NTRS)

    Pikuta, Elena V.; Hoover, Richard B.; Bej, Asim K.; Marsic, Damien; Whitman, William B.; Krader, Paul E.; Tang, Jane

    2006-01-01

    A novel, extremely psychrotolerant, facultative anaerobe, strain PmagGl(sup T), was isolated from guano of Magellanic penguins (Spheniscus magellanicus) collected in Chilean Patagonia. Gram-variable, motile cocci with a diameter of 1.3-2.0 micrometers were observed singularly or in pairs, short chains and irregular conglomerates. Growth occurred within the pH range 6.0-10.0, with optimum growth at pH 8.5. The temperature range for growth of the novel isolate was from -5 to 35 C, with optimum growth at 28-30 C. Strain PmagG1(sup T) did not require NaCl, as growth was observed in the presence of 0-6.5% NaCl with optimum growth at 0.5% (w/v). Strain PmagGl(sup T) was a catalase-negative chemo-organoheterotroph that used sugars and some organic acids as substrates. The metabolic end products were lactate, formate, acetate, ethanol and Con. Strain PmagG1(sup T) was sensitive to ampicillin, tetracycline, chloramphenicol, rifampicin, kanamycin and gentamicin. The G+C content of its genomic DNA was 45.8 mol%. 16S rRNA gene sequence analysis showed 100 % similarity of strain PmagG1(sup T) with Trichococcus collinsii ATCC BAA-296(sup T), but DNA-DNA hybridization between them demonstrated relatedness values of less than 45 plus or minus 1%. Another phylogenetically closely related species, Trichococcus pasteurii, showed 99.85 % similarity by 16s rRNA sequencing and DNA-DNA hybridization showed relatedness values of 47 plus or minus 1.5%. Based on genotypic and phenotypic characteristics, the novel species Trichococcus patagoniensis sp. nov. is proposed, with strain PmagG1(sup T) (=ATCC BAA-756(sup T)=JCM 12176(sup T)=CIP 108035(sup T)) as the type strain.

  20. The Dynamics of Democracy, Development and Cultural Values

    PubMed Central

    Spaiser, Viktoria; Ranganathan, Shyam; Mann, Richard P.; Sumpter, David J. T.

    2014-01-01

    Over the past decades many countries have experienced rapid changes in their economies, their democratic institutions and the values of their citizens. Comprehensive data measuring these changes across very different countries has recently become openly available. Between country similarities suggest common underlying dynamics in how countries develop in terms of economy, democracy and cultural values. We apply a novel Bayesian dynamical systems approach to identify the model which best captures the complex, mainly non-linear dynamics that underlie these changes. We show that the level of Human Development Index (HDI) in a country drives first democracy and then higher emancipation of citizens. This change occurs once the countries pass a certain threshold in HDI. The data also suggests that there is a limit to the growth of wealth, set by higher emancipation. Having reached a high level of democracy and emancipation, societies tend towards equilibrium that does not support further economic growth. Our findings give strong empirical evidence against a popular political science theory, known as the Human Development Sequence. Contrary to this theory, we find that implementation of human-rights and democratisation precede increases in emancipative values. PMID:24905920

  1. The dynamics of democracy, development and cultural values.

    PubMed

    Spaiser, Viktoria; Ranganathan, Shyam; Mann, Richard P; Sumpter, David J T

    2014-01-01

    Over the past decades many countries have experienced rapid changes in their economies, their democratic institutions and the values of their citizens. Comprehensive data measuring these changes across very different countries has recently become openly available. Between country similarities suggest common underlying dynamics in how countries develop in terms of economy, democracy and cultural values. We apply a novel Bayesian dynamical systems approach to identify the model which best captures the complex, mainly non-linear dynamics that underlie these changes. We show that the level of Human Development Index (HDI) in a country drives first democracy and then higher emancipation of citizens. This change occurs once the countries pass a certain threshold in HDI. The data also suggests that there is a limit to the growth of wealth, set by higher emancipation. Having reached a high level of democracy and emancipation, societies tend towards equilibrium that does not support further economic growth. Our findings give strong empirical evidence against a popular political science theory, known as the Human Development Sequence. Contrary to this theory, we find that implementation of human-rights and democratisation precede increases in emancipative values.

  2. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. Efficient bootstrap estimates for tail statistics

    NASA Astrophysics Data System (ADS)

    Breivik, Øyvind; Aarnes, Ole Johan

    2017-03-01

    Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.

  4. Rejection of reclassification of Lactobacillus kimchii and Lactobacillus bobalius as later subjective synonyms of Lactobacillus paralimentarius using comparative genomics.

    PubMed

    Yang, Seung-Jo; Kim, Byung-Yong; Chun, Jongsik

    2017-11-01

    Lactobacillus bobalius, Lactobacillus kimchii and Lactobacillus paralimentarius belong to the genus Lactobacillus and show close phylogenetic relationships. In a previous study, L. bobalius and L. kimchii were proposed to be reclassified as later heterotypic synonyms of L. paralimentarius using high 16S rRNA gene sequence similarities (≥99.5 %) and DNA-DNA hybridization values (≥82 %). We determined high quality whole genome assemblies of the type strains of L. bobalius and L. kimchii, which were then compared with that of L. paralimentarius. Average nucleotide identity values among three genomes ranged from 91.4 to 92.3 % which are clearly below 95~96 %, the generally recognized cutoff value for bacterial species boundaries. On the basis of comparative genomic evidence, L. bobalius, L. kimchii, and L. paralimentarius should stand as separate species in the genus Lactobacillus. We therefore suggest rejecting the previous proposal to combine these three species into a single species.

  5. What is a melody? On the relationship between pitch and brightness of timbre.

    PubMed

    Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

    2013-01-01

    Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners' task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.

  6. A chain-retrieval model for voluntary task switching.

    PubMed

    Vandierendonck, André; Demanet, Jelle; Liefooghe, Baptist; Verbruggen, Frederick

    2012-09-01

    To account for the findings obtained in voluntary task switching, this article describes and tests the chain-retrieval model. This model postulates that voluntary task selection involves retrieval of task information from long-term memory, which is then used to guide task selection and task execution. The model assumes that the retrieved information consists of acquired sequences (or chains) of tasks, that selection may be biased towards chains containing more task repetitions and that bottom-up triggered repetitions may overrule the intended task. To test this model, four experiments are reported. In Studies 1 and 2, sequences of task choices and the corresponding transition sequences (task repetitions or switches) were analyzed with the help of dependency statistics. The free parameters of the chain-retrieval model were estimated on the observed task sequences and these estimates were used to predict autocorrelations of tasks and transitions. In Studies 3 and 4, sequences of hand choices and their transitions were analyzed similarly. In all studies, the chain-retrieval model yielded better fits and predictions than statistical models of event choice. In applications to voluntary task switching (Studies 1 and 2), all three parameters of the model were needed to account for the data. When no task switching was required (Studies 3 and 4), the chain-retrieval model could account for the data with one or two parameters clamped to a neutral value. Implications for our understanding of voluntary task selection and broader theoretical implications are discussed. Copyright © 2012 Elsevier Inc. All rights reserved.

  7. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  8. Purification, characterization and molecular cloning of chymotrypsin inhibitor peptides from the venom of Burmese Daboia russelii siamensis.

    PubMed

    Guo, Chun-Teng; McClean, Stephen; Shaw, Chris; Rao, Ping-Fan; Ye, Ming-Yu; Bjourson, Anthony J

    2013-05-01

    One novel Kunitz BPTI-like peptide designated as BBPTI-1, with chymotrypsin inhibitory activity was identified from the venom of Burmese Daboia russelii siamensis. It was purified by three steps of chromatography including gel filtration, cation exchange and reversed phase. A partial N-terminal sequence of BBPTI-1, HDRPKFCYLPADPGECLAHMRSF was obtained by automated Edman degradation and a Ki value of 4.77nM determined. Cloning of BBPTI-1 including the open reading frame and 3' untranslated region was achieved from cDNA libraries derived from lyophilized venom using a 3' RACE strategy. In addition a cDNA sequence, designated as BBPTI-5, was also obtained. Alignment of cDNA sequences showed that BBPTI-5 exhibited an identical sequence to BBPTI-1 cDNA except for an eight nucleotide deletion in the open reading frame. Gene variations that represented deletions in the BBPTI-5 cDNA resulted in a novel protease inhibitor analog. Amino acid sequence alignment revealed that deduced peptides derived from cloning of their respective precursor cDNAs from libraries showed high similarity and homology with other Kunitz BPTI proteinase inhibitors. BBPTI-1 and BBPTI-5 consist of 60 and 66 amino acid residues respectively, including six conserved cysteine residues. As these peptides have been reported to have influence on the processes of coagulation, fibrinolysis and inflammation, their potential application in biomedical contexts warrants further investigation. Copyright © 2013 Elsevier Inc. All rights reserved.

  9. Intramural activation and repolarization sequences in canine ventricles. Experimental and simulation studies.

    PubMed

    Taccardi, Bruno; Punske, Bonnie B; Sachse, Frank; Tricoche, Xavier; Colli-Franzone, Piero; Pavarino, Luca F; Zabawa, Christine

    2005-10-01

    There are no published data showing the three-dimensional sequence of repolarization and the associated potential fields in the ventricles. Knowledge of the sequence of repolarization has medical relevance because high spatial dispersion of recovery times and action potential durations favors cardiac arrhythmias. In this study we describe measured and simulated 3-D excitation and recovery sequences and activation-recovery intervals (ARIs) (measured) or action potential durations (APDs) (simulated) in the ventricular walls. We recorded from 600 to 1400 unipolar electrograms from canine ventricular walls during atrial and ventricular pacing at 350-450 ms cycle length. Measured excitation and recovery times and ARIs were displayed as 2-D maps in transmural planes or 3-D maps in the volume explored, using specially developed software. Excitation and recovery sequences and APD distributions were also simulated in parallelepipedal slabs using anisotropic monodomain or bidomain models based on the Lou-Rudy version 1 model with homogeneous membrane properties. Simulations showed that in the presence of homogeneous membrane properties, the sequence of repolarization was similar but not identical to the excitation sequence. In a transmural plane perpendicular to epicardial fiber direction, both activation and recovery pathways starting from an epicardial pacing site returned toward the epicardium at a few cm distance from the pacing site. However, APDs were not constant, but had a dispersion of approximately 14 ms in the simulated domain. The maximum APD value was near the pacing site and two minima appeared along a line perpendicular to fiber directions, passing through the pacing site. Electrical measurements in dog ventricles showed that, for short cycle lengths, both excitation and recovery pathways, starting from an epicardial pacing site, returned toward the epicardium. For slower pacing rates, pathways of recovery departed from the pathway of excitation. Highest ARI values were observed near the pacing site in part of the experiments. In addition, maps of activation-recovery intervals showed mid-myocardial clusters with activation-recovery intervals that were slightly longer than ARIs closer to the epi- or endocardium, suggesting the presence of M cells in those areas. Transmural dispersion of measured ARIs was on the order of 20-25 ms. Potential distributions during recovery were less affected by myocardial anisotropy than were excitation potentials.

  10. Comparing diffusion weighted imaging with clinical and blood parameters, and with short tau inversion recovery sequence in detecting spinal and sacroiliac joint inflammation in axial spondyloarthritis.

    PubMed

    Chung, Ho Yin; Xu, Xiaopei; Lau, Vince Wing Hang; Ho, Grace; Lee, Ka Lai; Li, Philip Hei; Tsang, Helen Hoi Lun; Kwok, Suet Kei; Lau, Chak Sing; Wong, Chun Sing

    2017-01-01

    To investigate the usefulness of diffusion weighted imaging (DWI) by comparing with clinical features, blood parameters and traditional short tau inversion recovery (STIR) sequence in detecting spinal and sacroiliac (SI) joint inflammation in axial spondyloarthritis (axSpA) patients. One hundred and ten axSpA patients were recruited. Clinical, radiological and blood parameters were recorded. DWI and STIR MRI were performed simultaneously and results were scored according to the Spondyloarthritis Research Consortium of Canada (SPARCC) for comparison. Apparent diffusion coef cient (ADC) values were also calculated. DWI did not correlate with clinical parameters or blood parameters. It also had lowered sensitivity. When compared with STIR sequence, it correlated well with STIR sequence at the SI joint level (CC 0.76, p<0.001), but weakly at the spinal level (CC 0.23, p=0.02). At the SI joint level, the presence of inflammation on both STIR sequence and DWI was associated with an increase in maximum (B=0.24, p=0.02 in STIR; B=0.37, p<0.001 in DWI) and mean ADC values (B=0.17, p=0.003 in STIR; B=0.15, p=0.01 in DWI). Maximum (B=0.19, p=0.04) and mean spinal ADC values (B=0.18, p=0.01) were also positively associated with DWI detected spinal inflammation. Presence of Modic lesions showed positive correlation with STIR sequence (B=7.12, p=0.01) but not spinal ADC values. Despite DWI correlates with STIR sequence, it has lower sensitivity. However, ADC values appear to be independent of Modic lesions and may supplement STIR sequence to differentiate degeneration.

  11. PipeOnline 2.0: automated EST processing and functional data sorting.

    PubMed

    Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

    2002-11-01

    Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

  12. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    PubMed

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  13. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  14. Paenibacillus nebraskensis sp. nov., isolated from the root surface of field-grown maize.

    PubMed

    Kämpfer, Peter; Busse, Hans-Jürgen; McInroy, John A; Hu, Chia-Hui; Kloepper, Joseph W; Glaeser, Stefanie P

    2017-12-01

    A Gram-positive-staining, aerobic, non-endospore-forming bacterial strain (JJ-59 T ), isolated from a field-grown maize plant in Dunbar, Nebraska in 2014 was studied by a polyphasic approach. Based on 16S rRNA gene sequence similarity comparisons, strain JJ-59 T was shown to be a member of the genus Paenibacillus, most closely related to the type strains of Paenibacillus aceris (98.6 % 16S rRNA gene sequence similarity) and Paenibacillus chondroitinus (97.8 %). For all other type strains of species of the genus Paenibacillus lower 16S rRNA gene sequence similarities were obtained. DNA-DNA hybridization values of strain JJ-59 T to the type strains of P. aceris and P. chondroitinus were 26 % (reciprocal, 59 %) and 52 % (reciprocal, 59 %), respectively. Chemotaxonomic characteristics such as the presence of meso-diaminopimelic acid in the peptidoglycan, the major quinone MK-7 and spermidine as the major polyamine were in agreement with the characteristics of the genus Paenibacillus. Strain JJ-59 T shared with its next related species P. aceris the major lipids diphosphatidylglycerol, phosphatidylglycerol, phosphatidylethanolamine and an unidentified aminophospholipid, but the presence/absence of certain lipids was clearly distinguishable. Major fatty acids of strain JJ-59 T were anteiso-C15 : 0, iso-C15 : 0 and iso-C16 : 0, and the genomic G+C content is 47.2 mol%. Physiological and biochemical characteristics of strain JJ-59 T were clearly different from the most closely related species of the genus Paenibacillus. Thus, strain JJ-59 T represents a novel species of the genus Paenibacillus, for which the name Paenibacillus nebraskensis sp. nov. is proposed, with JJ-59 T (=DSM 103623 T =CIP 111179 T =LMG 29764 T ) as the type strain.

  15. The multidimensional perturbation value: a single metric to measure similarity and activity of treatments in high-throughput multidimensional screens.

    PubMed

    Hutz, Janna E; Nelson, Thomas; Wu, Hua; McAllister, Gregory; Moutsatsos, Ioannis; Jaeger, Savina A; Bandyopadhyay, Somnath; Nigsch, Florian; Cornett, Ben; Jenkins, Jeremy L; Selinger, Douglas W

    2013-04-01

    Screens using high-throughput, information-rich technologies such as microarrays, high-content screening (HCS), and next-generation sequencing (NGS) have become increasingly widespread. Compared with single-readout assays, these methods produce a more comprehensive picture of the effects of screened treatments. However, interpreting such multidimensional readouts is challenging. Univariate statistics such as t-tests and Z-factors cannot easily be applied to multidimensional profiles, leaving no obvious way to answer common screening questions such as "Is treatment X active in this assay?" and "Is treatment X different from (or equivalent to) treatment Y?" We have developed a simple, straightforward metric, the multidimensional perturbation value (mp-value), which can be used to answer these questions. Here, we demonstrate application of the mp-value to three data sets: a multiplexed gene expression screen of compounds and genomic reagents, a microarray-based gene expression screen of compounds, and an HCS compound screen. In all data sets, active treatments were successfully identified using the mp-value, and simulations and follow-up analyses supported the mp-value's statistical and biological validity. We believe the mp-value represents a promising way to simplify the analysis of multidimensional data while taking full advantage of its richness.

  16. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  17. Molecular studies on larvae of Pseudoterranova parasite of Trichiurus lepturus Linnaeus, 1758 and Pomatomus saltatrix (Linnaeus, 1766) off Brazilian waters.

    PubMed

    Borges, Juliana N; Cunha, Luiz F G; Miranda, Daniele F; Monteiro-Neto, Cassiano; Santos, Cláudia P

    2015-12-01

    Pseudoterranova larvae parasitizing cutlassfish Trichiurus lepturus and bluefish Pomatomus saltatrix from Southwest Atlantic coast of Brazil were studied in this work by morphological, ultrastructural and molecular approaches. The genetic analysis were performed for the ITS2 intergenic region specific for Pseudoterranova decipiens, the partial 28S (LSU) of ribosomal DNA and the mtDNA cox-1 region. We obtained results for the 28S region and mtDNA cox-1 that was amplified using the polymerase chain reaction and sequenced to evaluate the phylogenetic relationships between sequences of this study and sequences from the GenBank. The morphological profile indicated that all the nine specimens collected from both fish were L3 larvae of Pseudoterranova sp. The genetic profile confirmed the generic level but due to the absence of similar sequences for adult parasites on GenBank for the regions amplifyied, it was not possible to identify them to the species level. The sequences obtained presented 89% of similarity with Pseudoterranova decipiens (28S sequences) and Contracaecum osculatum B (mtDNA cox-1). The low similarity allied to the fact that the amplification with the specific primer for P. decipiens didn't occur, lead us to conclude that our sequences don't belong to P. decipiens complex.

  18. Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.

    PubMed

    Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming

    2016-07-01

    Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.

  19. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

    PubMed

    Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

    2018-01-01

    Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.

  20. Multilocus Sequence Typing Compared to Pulsed-Field Gel Electrophoresis for Molecular Typing of Pseudomonas aeruginosa▿

    PubMed Central

    Johnson, Jennifer K.; Arduino, Sonia M.; Stine, O. Colin; Johnson, Judith A.; Harris, Anthony D.

    2007-01-01

    For hospital epidemiologists, determining a system of typing that is discriminatory is essential for measuring the effectiveness of infection control measures. In situations in which the incidence of resistant Pseudomonas aeruginosa is increasing, the ability to discern whether it is due to patient-to-patient transmission versus an increase in patient endogenous strains is often made on the basis of molecular typing. The present study compared the discriminatory abilities of pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST) for 90 P. aeruginosa isolates obtained from cultures of perirectal surveillance swabs from patients in an intensive care unit. PFGE identified 85 distinct types and 76 distinct groups when similarity cutoffs of 100% and 87%, respectively, were used. By comparison, MLST identified 60 sequence types that could be clustered into 11 clonal complexes and 32 singletons. By using the Simpson index of diversity (D), PFGE had a greater discriminatory ability than MLST for P. aeruginosa isolates (D values, 0.999 versus 0.975, respectively). Thus, while MLST was better for detecting genetic relatedness, we determined that PFGE was more discriminatory than MLST for determining genetic differences in P. aeruginosa. PMID:17881548

  1. Mass fingerprinting of the venom and transcriptome of venom gland of scorpion Centruroides tecomanus.

    PubMed

    Valdez-Velázquez, Laura L; Quintero-Hernández, Verónica; Romero-Gutiérrez, Maria Teresa; Coronas, Fredy I V; Possani, Lourival D

    2013-01-01

    Centruroides tecomanus is a Mexican scorpion endemic of the State of Colima, that causes human fatalities. This communication describes a proteome analysis obtained from milked venom and a transcriptome analysis from a cDNA library constructed from two pairs of venom glands of this scorpion. High perfomance liquid chromatography separation of soluble venom produced 80 fractions, from which at least 104 individual components were identified by mass spectrometry analysis, showing to contain molecular masses from 259 to 44,392 Da. Most of these components are within the expected molecular masses for Na(+)- and K(+)-channel specific toxic peptides, supporting the clinical findings of intoxication, when humans are stung by this scorpion. From the cDNA library 162 clones were randomly chosen, from which 130 sequences of good quality were identified and were clustered in 28 contigs containing, each, two or more expressed sequence tags (EST) and 49 singlets with only one EST. Deduced amino acid sequence analysis from 53% of the total ESTs showed that 81% (24 sequences) are similar to known toxic peptides that affect Na(+)-channel activity, and 19% (7 unique sequences) are similar to K(+)-channel especific toxins. Out of the 31 sequences, at least 8 peptides were confirmed by direct Edman degradation, using components isolated directly from the venom. The remaining 19%, 4%, 4%, 15% and 5% of the ESTs correspond respectively to proteins involved in cellular processes, antimicrobial peptides, venom components, proteins without defined function and sequences without similarity in databases. Among the cloned genes are those similar to metalloproteinases.

  2. Anaerobic Oxidation of o-Xylene, m-Xylene, and Homologous Alkylbenzenes by New Types of Sulfate-Reducing Bacteria

    PubMed Central

    Harms, Gerda; Zengler, Karsten; Rabus, Ralf; Aeckersberg, Frank; Minz, Dror; Rosselló-Mora, Ramon; Widdel, Friedrich

    1999-01-01

    Various alkylbenzenes were depleted during growth of an anaerobic, sulfate-reducing enrichment culture with crude oil as the only source of organic substrates. From this culture, two new types of mesophilic, rod-shaped sulfate-reducing bacteria, strains oXyS1 and mXyS1, were isolated with o-xylene and m-xylene, respectively, as organic substrates. Sequence analyses of 16S rRNA genes revealed that the isolates affiliated with known completely oxidizing sulfate-reducing bacteria of the δ subclass of the class Proteobacteria. Strain oXyS1 showed the highest similarities to Desulfobacterium cetonicum and Desulfosarcina variabilis (similarity values, 98.4 and 98.7%, respectively). Strain mXyS1 was less closely related to known species, the closest relative being Desulfococcus multivorans (similarity value, 86.9%). Complete mineralization of o-xylene and m-xylene was demonstrated in quantitative growth experiments. Strain oXyS1 was able to utilize toluene, o-ethyltoluene, benzoate, and o-methylbenzoate in addition to o-xylene. Strain mXyS1 oxidized toluene, m-ethyltoluene, m-isoproyltoluene, benzoate, and m-methylbenzoate in addition to m-xylene. Strain oXyS1 did not utilize m-alkyltoluenes, whereas strain mXyS1 did not utilize o-alkyltoluenes. Like the enrichment culture, both isolates grew anaerobically on crude oil with concomitant reduction of sulfate to sulfide. PMID:10049854

  3. Lactobacillus paralimentarius sp. nov., isolated from sourdough.

    PubMed

    Cai, Y; Okada, H; Mori, H; Benno, Y; Nakase, T

    1999-10-01

    Six strains of lactic acid bacteria isolated from sourdough were characterized taxonomically. They were Gram-positive, catalase-negative, facultatively anaerobic rods that did not produce gas from glucose. Morphological and physiological data indicated that the strains belong to the genus Lactobacillus and they were similar to Lactobacillus alimentarius in phenotypic characteristics. These strains shared the same phenotypic characteristics and exhibited intragroup DNA homology values of over 89.8%, indicating that they comprised a single species. The G + C content of the DNA for the strains was 37.2-38.0 mol%. The 16S rRNA sequence of representative strain TB 1T was determined and aligned with that of other Lactobacillus species. This strain was placed in the genus Lactobacillus on the basis of phylogenetic analysis. L. alimentarius was the most closely related species in the phylogenetic tree and this species also showed the highest sequence homology value (96%) with strain TB 1T. DNA-DNA hybridization indicated that strain TB 1T did not belong to L. alimentarius. It is proposed that these strains are placed in the genus Lactobacillus as a new species, Lactobacillus paralimentarius sp. nov. The type strain of L. paralimentarius is TB 1T, which has been deposited in the Japan Collection of Microorganisms (JCM) as strain JCM 10415T.

  4. Using Zipf-Mandelbrot law and graph theory to evaluate animal welfare

    NASA Astrophysics Data System (ADS)

    de Oliveira, Caprice G. L.; Miranda, José G. V.; Japyassú, Hilton F.; El-Hani, Charbel N.

    2018-02-01

    This work deals with the construction and testing of metrics of welfare based on behavioral complexity, using assumptions derived from Zipf-Mandelbrot law and graph theory. To test these metrics we compared yellow-breasted capuchins (Sapajus xanthosternos) (Wied-Neuwied, 1826) (PRIMATES CEBIDAE) found in two institutions, subjected to different captive conditions: a Zoobotanical Garden (hereafter, ZOO; n = 14), in good welfare condition, and a Wildlife Rescue Center (hereafter, WRC; n = 8), in poor welfare condition. In the Zipf-Mandelbrot-based analysis, the power law exponent was calculated using behavior frequency values versus behavior rank value. These values allow us to evaluate variations in individual behavioral complexity. For each individual we also constructed a graph using the sequence of behavioral units displayed in each recording (average recording time per individual: 4 h 26 min in the ZOO, 4 h 30 min in the WRC). Then, we calculated the values of the main graph attributes, which allowed us to analyze the complexity of the connectivity of the behaviors displayed in the individuals' behavioral sequences. We found significant differences between the two groups for the slope values in the Zipf-Mandelbrot analysis. The slope values for the ZOO individuals approached -1, with graphs representing a power law, while the values for the WRC individuals diverged from -1, differing from a power law pattern. Likewise, we found significant differences for the graph attributes average degree, weighted average degree, and clustering coefficient when comparing the ZOO and WRC individual graphs. However, no significant difference was found for the attributes modularity and average path length. Both analyses were effective in detecting differences between the patterns of behavioral complexity in the two groups. The slope values for the ZOO individuals indicated a higher behavioral complexity when compared to the WRC individuals. Similarly, graph construction and the calculation of its attributes values allowed us to show that the complexity of the connectivity among the behaviors was higher in the ZOO than in the WRC individual graphs. These results show that the two measuring approaches introduced and tested in this paper were capable of capturing the differences in welfare levels between the two conditions, as shown by differences in behavioral complexity.

  5. An EST dataset for Metasequoia glyptostroboides buds: the first EST resource for molecular genomics studies in Metasequoia.

    PubMed

    Zhao, Ying; Thammannagowda, Shivegowda; Staton, Margaret; Tang, Sha; Xia, Xinli; Yin, Weilun; Liang, Haiying

    2013-03-01

    The "living fossil" Metasequoia glyptostroboides Hu et Cheng, commonly known as dawn redwood or Chinese redwood, is the only living species in the genus and is valued for its essential oil and crude extracts that have great potential for anti-fungal activity. Despite its paleontological significance and economical value as a rare relict species, genomic resources of Metasequoia are very limited. In order to gain insight into the molecular mechanisms behind the formation of reproductive buds and the transition from vegetative phase to reproductive phase in Metasequoia, we performed sequencing of expressed sequence tags from Metasequoia vegetative buds and female buds. By using the 454 pyrosequencing technology, a total of 1,571,764 high-quality reads were generated, among which 733,128 were from vegetative buds and 775,636 were from female buds. These EST reads were clustered and assembled into 114,124 putative unique transcripts (PUTs) with an average length of 536 bp. The 97,565 PUTs that were at least 100 bp in length were functionally annotated by a similarity search against public databases and assigned with Gene Ontology (GO) terms. A total of 59 known floral gene families and 190 isotigs involved in hormone regulation were captured in the dataset. Furthermore, a set of PUTs differentially expressed in vegetative and reproductive buds, as well as SSR motifs and high confidence SNPs, were identified. This is the first large-scale expressed sequence tags ever generated in Metasequoia and the first evidence for floral genes in this critically endangered deciduous conifer species.

  6. Purification and characterization of the restriction endonuclease RsrI, an isoschizomer of EcoRI.

    PubMed

    Greene, P J; Ballard, B T; Stephenson, F; Kohr, W J; Rodriguez, H; Rosenberg, J M; Boyer, H W

    1988-08-15

    Rhodobacter sphaeroides strain 630 produces restriction enzyme RsrI which is an isoschizomer of EcoRI. We have purified this enzyme and initiated a comparison with the EcoRI endonuclease. The properties of RsrI are consistent with a reaction mechanism similar to that of EcoRI: the position of cleavage within the -GAATTC-site is identical, the MgCl2 optimum for the cleavage is identical, and the pH profile is similar. Methylation of the substrate sequence by the EcoRI methylase protects the site from cleavage by the RsrI endonuclease. RsrI cross-reacts strongly with anti-EcoRI serum indicating three-dimensional structural similarities. We have determined the sequence of 34 N terminal amino acids for RsrI and this sequence possesses significant similarity to the EcoRI N terminus.

  7. Three 3D graphical representations of DNA primary sequences based on the classifications of DNA bases and their applications.

    PubMed

    Xie, Guosen; Mo, Zhongxi

    2011-01-21

    In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  9. Natural frequency changes due to damage in composite beams

    NASA Astrophysics Data System (ADS)

    Negru, I.; Gillich, G. R.; Praisach, Z. I.; Tufoi, M.; Gillich, N.

    2015-07-01

    Transversal cracks in structures affect their stiffness as well as the natural frequency values. This paper presents a research performed to find the way how frequencies of sandwich beams change by the occurrence of damage. The influence of the locally stored energy, for ten transverse vibration modes, on the frequency shifts is derived from a study regarding the effect of stiffness decrease, realized by means of the finite element analysis. The relation between the local value of the bending moment and the frequency drop is exemplified by a concrete case. It is demonstrated that a reference curve representing the damage severity exists whence any frequency shift is derivable in respect to damage depth and location. This curve is obtained, for isotropic and multi-layer beams as well, from the stored energy (i.e. stiffness decrease), and is similar to that attained using the stress intensity factor in fracture mechanics. Also, it is proved that, for a given crack, irrespective to its depth, the frequency drop ratio of any two transverse modes is similar. This permitted separating the effect of damage location from that of its severity and to define a Damage Location Indicator as a sequence of squared of the normalized mode shape curvatures.

  10. AlignMe—a membrane protein sequence alignment web server

    PubMed Central

    Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

    2014-01-01

    We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

  11. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Computationally predicted IgE epitopes of walnut allergens contribute to cross-reactivity with peanuts

    USDA-ARS?s Scientific Manuscript database

    Cross reactivity between peanuts and tree nuts implies that similar IgE epitopes are present in their proteins. To determine whether walnut sequences similar to known peanut IgE binding sequences, according to the property distance (PD) scale implemented in the Structural Database of Allergenic Prot...

  13. Sequenced Integration and the Identification of a Problem-Solving Approach through a Learning Process

    ERIC Educational Resources Information Center

    Cormas, Peter C.

    2016-01-01

    Preservice teachers (N = 27) in two sections of a sequenced, methodological and process integrated mathematics/science course solved a levers problem with three similar learning processes and a problem-solving approach, and identified a problem-solving approach through one different learning process. Similar learning processes used included:…

  14. Characterizing spatial heterogeneity based on the b-value and fractal analyses of the 2015 Nepal earthquake sequence

    NASA Astrophysics Data System (ADS)

    Nampally, Subhadra; Padhy, Simanchal; Dimri, Vijay P.

    2018-01-01

    The nature of spatial distribution of heterogeneities in the source area of the 2015 Nepal earthquake is characterized based on the seismic b-value and fractal analysis of its aftershocks. The earthquake size distribution of aftershocks gives a b-value of 1.11 ± 0.08, possibly representing the highly heterogeneous and low stress state of the region. The aftershocks exhibit a fractal structure characterized by a spectrum of generalized dimensions, Dq varying from D2 = 1.66 to D22 = 0.11. The existence of a fractal structure suggests that the spatial distribution of aftershocks is not a random phenomenon, but it self-organizes into a critical state, exhibiting a scale-independent structure governed by a power-law scaling, where a small perturbation in stress is sufficient enough to trigger aftershocks. In order to obtain the bias in fractal dimensions resulting from finite data size, we compared the multifractal spectrum for the real data and random simulations. On comparison, we found that the lower limit of bias in D2 is 0.44. The similarity in their multifractal spectra suggests the lack of long-range correlation in the data, with an only weakly multifractal or a monofractal with a single correlation dimension D2 characterizing the data. The minimum number of events required for a multifractal process with an acceptable error is discussed. We also tested for a possible correlation between changes in D2 and energy released during the earthquakes. The values of D2 rise during the two largest earthquakes (M > 7.0) in the sequence. The b- and D2 values are related by D2 = 1.45 b that corresponds to the intermediate to large earthquakes. Our results provide useful constraints on the spatial distribution of b- and D2-values, which are useful for seismic hazard assessment in the aftershock area of a large earthquake.

  15. XRD and FTIR crystallinity indices in sound human tooth enamel and synthetic hydroxyapatite.

    PubMed

    Reyes-Gasga, José; Martínez-Piñeiro, Esmeralda L; Rodríguez-Álvarez, Galois; Tiznado-Orozco, Gaby E; García-García, Ramiro; Brès, Etienne F

    2013-12-01

    The crystallinity index (CI) is a measure of the percentage of crystalline material in a given sample and it is also correlated to the degree of order within the crystals. In the literature two ways are reported to measure the CI: X-ray diffraction and infrared spectroscopy. Although the CI determined by these techniques has been adopted in the field of archeology as a structural order measure in the bone with the idea that it can help e.g. in the sequencing of the bones in chronological and/or stratigraphic order, some debate remains about the reliability of the CI values. To investigate similarities and differences between the two techniques, the CI of sound human tooth enamel and synthetic hydroxyapatite (HAP) was measured in this work by X-ray diffraction (XRD) and Fourier Transform Infrared spectroscopy (FTIR), at room temperature and after heat treatment. Although the (CI)XRD index is related to the crystal structure of the samples and the (CI)FTIR index is related to the vibration modes of the molecular bonds, both indices showed similar qualitative behavior for heat-treated samples. At room temperature, the (CI)XRD value indicated that enamel is more crystalline than synthetic HAP, while (CI)FTIR indicated the opposite. Scanning (SEM) and transmission (TEM) images were also used to corroborate the measured CI values. © 2013.

  16. Lead in the Getchell-Turquoise ridge Carlin-type gold deposits from the perspective of potential igneous and sedimentary rock sources in Northern Nevada: Implications for fluid and metal sources

    USGS Publications Warehouse

    Tosdal, R.M.; Cline, J.S.; Fanning, C.M.; Wooden, J.L.

    2003-01-01

    Lead isotope compositions of bulk mineral samples (fluorite, orpiment, and realgar) determined using conventional techniques and of ore-stage arsenian pyrite using the Sensitive High Resolution Ion-Microprobe (SHRIMP) in the Getchell and Turquoise Ridge Carlin-type gold deposits (Osgood Mountains) require contribution from two different Pb sources. One Pb source dominates the ore stage. It has a limited Pb isotope range characterized by 208Pb/206Pb values of 2.000 to 2.005 and 207Pb/206Pb values of 0.8031 to 0.8075, as recorded by 10-??m-diameter spot SHRIMP analyses of ore-stage arsenian pyrite. These values approximately correspond to 206Pb/204Pb of 19.3 to 19.6, 207Pb/204Pb of 15.65 to 15.75, and 208Pb/204Pb of 39.2 to 39.5. This Pb source is isotopically similar to that in average Neoproterozoic and Cambrian elastic rocks but not to any potential magmatic sources. Whether those clastic rocks provided Pb to the ore fluid cannot be unequivocally proven because their Pb isotope compositions over the same range as in ore-stage arsenian pyrite are similar to those of Ordovician to Devonian siliciclastic and calcareous rocks. The Pb source in the calcareous rocks most likely is largely detrital minerals, since that detritus was derived from the same sources as the detritus in the Neoproterozoic and Cambrian clastic rocks. The second Pb source is characterized by a large range of 206Pb/204Pb values (18-34) with a limited range of 208Pb/204Pb values (38.1-39.5), indicating low but variable Th/U and high and variable U/Pb values. The second Pb source dominates late and postore-stage minerals but is also found in preore sulfide minerals. These Pb isotope characteristics typify Ordovician to Devonian siliciclastic and calcareous rocks around the Carlin trend in northeast Nevada. Petrologically similar rocks host the Getchell and Turquoise Ridge deposits. Lead from the second source was either contributed from the host sedimentary rock sequences or brought into the hydrothermal system by oxidized ground water as the system collapsed. Late ore- and postore-stage sulfide minerals (pyrite, orpiment, and stibnite) from the Betze-Post and Meikle deposits in the Carlin trend and from the Jerritt Canyon mining district have Pb isotope characteristics similar to those determined in Getchell and Turquoise Ridge. This observation suggests that the Pb isotope compositions of their ore fluids may be similar to those at Getchell and Turquoise Ridge. Two models can explain the Pb isotope compositions of the ore-stage arsenian pyrite versus the late ore or postore sulfide minerals. In either model, Pb from the Ordovician to Devonian siliciclastic and calcareous rock source enters the hydrothermal system late in the ore stage but not to any extent during the main stage of ore deposition. In one model, ore-stage Pb was derived from a source with Pb isotope compositions similar to those of the Neoproterozoic and Cambrian clastic sequence, transported as part of the ore fluid and then deposited in the ore-stage arsenian pyrite and fluorite. The second model is based on the observation that the Pb isotope characteristics of the ore-stage minerals also are found in some Ordovician to Devonian calcareous and siliciclastic rocks. Hence, ore-stage Pb could have been derived locally and simply concentrated during the ore stage. Critical to the second model is the removal of all high 206Pb/204Pb (>20) material during alteration. It Also requires the retention of only the low 206Pb/204Pb component of the Ordovician to Devonian sedimentary rocks. This critical step is possible only if the high 206Pb/204Pb values are contained in readily dissolvable mineral phases, whereas the low 206Pb/204Pb values are found only in refractory minerals that released Pb during a final alteration stage just prior deposition of auriferous arsenian pyrite. Distinguishing between Pb transported with the ore fluid or inherited from the site of mineral deposition is not straightforward

  17. Sirius PSB: a generic system for analysis of biological sequences.

    PubMed

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  18. RNA sequencing confirms similarities between PPI-responsive oesophageal eosinophilia and eosinophilic oesophagitis.

    PubMed

    Peterson, K A; Yoshigi, M; Hazel, M W; Delker, D A; Lin, E; Krishnamurthy, C; Consiglio, N; Robson, J; Yandell, M; Clayton, F

    2018-06-04

    Although current American guidelines distinguish proton pump inhibitor-responsive oesophageal eosinophilia (PPI-REE) from eosinophilic oesophagitis (EoE), these entities are broadly similar. While two microarray studies showed that they have similar transcriptomes, more extensive RNA sequencing studies have not been done previously. To determine whether RNA sequencing identifies genetic markers distinguishing PPI-REE from EoE. We retrospectively examined 13 PPI-REE and 14 EoE biopsies, matched for tissue eosinophil content, and 14 normal controls. Patients and controls were not PPI-treated at the time of biopsy. We did RNA sequencing on formalin-fixed, paraffin-embedded tissue, with differential expression confirmation by quantitative polymerase chain reaction (PCR). We validated the use of formalin-fixed, paraffin-embedded vs RNAlater-preserved tissue, and compared our formalin-fixed, paraffin-embedded EoE results to a prior EoE study. By RNA sequencing, no genes were differentially expressed between the EoE and PPI-REE groups at the false discovery rate (FDR) ≤0.01 level. Compared to normal controls, 1996 genes were differentially expressed in the PPI-REE group and 1306 genes in the EoE group. By less stringent criteria, only MAPK8IP2 was differentially expressed between PPI-REE and EoE (FDR = 0.029, 2.2-fold less in EoE than in PPI-REE), with similar results by PCR. KCNJ2, which was differentially expressed in a prior study, was similar in the EoE and PPI-REE groups by both RNA sequencing and real-time PCR. Eosinophilic oesophagitis and PPI-REE have comparable transcriptomes, confirming that they are part of the same disease continuum. © 2018 John Wiley & Sons Ltd.

  19. Tectonic Implications of Paleoproterozoic Deo Khe Granitoids in Northwestern Vietnam

    NASA Astrophysics Data System (ADS)

    Hoang, T. H. A.; Yu, Y.; Pham, T. H.; Choi, S. H.; Tu, V. L.; Son, L. M.

    2015-12-01

    An integrated study of petrographic description, zircon U/Pb geochronology, and Hf isotopic analysis was carried out on the medium-grained two-mica Deo Khe Granitoids (DKG) in northwestern Vietnam. U/Pb zircon ages were 1855-1873 Ma, interpreted as the time of magma crystallization. On the basis of Hf isotopic compositions, a single-stage Hf model ages were estimated as 3.3-2.8 Ga. Values of Hf isotopes ɛHf (t) range from -23.6 to -17.5, suggesting that the DKG are products of reworked Archean crustal rocks. A similar sequence of tectonic events including the presence of 2.8-2.9 Ga tonalite-trondhjemite-granodiorite (TTG) gneiss, metamorphic development of TTG gneiss at 1.9-2.0 Ga, and 1.85 Ga magmatic activity were recognized both in Yangtze block and northwestern Vietnam. Therefore we propose that basement rocks in northern Vietnam are similar to those found along southern China.

  20. Geologic map showing springs rich in carbon dioxide or or chloride in California

    USGS Publications Warehouse

    Barnes, Ivan; Irwin, William P.; Gibson, H.A.

    1975-01-01

    Carbon dioxide- and chloride-rich springs occur in all geologic provinces in California, but are most abundant in the Coast Ranges and the Great Valley. The carbon-dioxide-rich springs issue mainly from Franciscan terrane; they also are rich in boron and are of the metamorphic type (White, 1957). Based on isotopic data, either the carbon dioxide or the water, or both, may be of metamorphic origin. Because of high magnesium values, the water of many of the carbon-dioxide-rich springs is thought to have passed through serpentinite. The chloride-rich waters are most common in rocks of the Great Valley sequence. Nearly all are more dilute than present-day sea water. The similarity in isotopic compositions of the metamorphic carbon-dioxide-rich water and the chloride-rich water may indicate a similar extent of water-rock interaction.

  1. Predicting residue-wise contact orders in proteins by support vector regression.

    PubMed

    Song, Jiangning; Burrage, Kevin

    2006-10-03

    The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.

  2. miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling.

    PubMed

    Plaisier, Christopher L; Bare, J Christopher; Baliga, Nitin S

    2011-07-01

    Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3' un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3'-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3'-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.

  3. Veillonella infantium sp. nov., an anaerobic, Gram-stain-negative coccus isolated from tongue biofilm of a Thai child.

    PubMed

    Mashima, Izumi; Liao, Yu-Chieh; Miyakawa, Hiroshi; Theodorea, Citra F; Thawboon, Boonyanit; Thaweboon, Sroisiri; Scannapieco, Frank A; Nakazawa, Futoshi

    2018-04-01

    A strain of a novel anaerobic, Gram-stain-negative coccus was isolated from the tongue biofilm of a Thai child. This strain was shown, at the phenotypic level and based on 16S rRNA gene sequencing, to be a member of the genus Veillonella. Comparative analysis of the 16S rRNA, dnaK and rpoB gene sequences indicated that phylogenetically the strain comprised a distinct novel branch within the genus Veillonella. The novel strain showed 99.8, 95.1 and 95.9 % similarity to partial 16S rRNA, dnaK and rpoB gene sequences, respectively, to the type strains of the two most closely related species, Veillonelladispar ATCC 17748 T and Veillonellatobetsuensis ATCC BAA-2400 T . The novel strain could be discriminated from previously reported species of the genus Veillonella based on partial dnaK and rpoB gene sequencing and average nucleotide identity values. The major acid end-product produced by this strain was acetic acid under anaerobic conditions in trypticase-yeast extract-haemin with 1 % (w/v) glucose or fructose medium. Lactate was fermented to acetic acid and propionic acid. Based on these observations, this strain represents a novel species, for which the name Veillonella infantium sp. nov. is proposed. The type strain is T11011-4 T (=JCM 31738 T =TSD-88 T ).

  4. Streptococcus pharyngis sp. nov., a novel streptococcal species isolated from the respiratory tract of wild rabbits.

    PubMed

    Vela, Ana I; Casas-Díaz, Encarna; Lavín, Santiago; Domínguez, Lucas; Fernández-Garayzábal, Jose F

    2015-09-01

    Four isolates of an unknown Gram-stain-positive, catalase-negative coccus-shaped organism, isolated from the pharynx of four wild rabbits, were characterized by phenotypic and molecular genetic methods. The micro-organisms were tentatively assigned to the genus Streptococcus based on cellular morphological and biochemical criteria, although the organisms did not appear to correspond to any species with a validly published name. Comparative 16S rRNA gene sequencing confirmed their identification as members of the genus Streptococcus, being most closely related phylogenetically to Streptococcus porcorum 682-03(T) (96.9% 16S rRNA gene sequence similarity). Analysis of rpoB and sodA gene sequences showed divergence values between the novel species and S. porcorum 682-03(T) (the closest phylogenetic relative determined from 16S rRNA gene sequences) of 18.1 and 23.9%, respectively. The novel bacterial isolate could be distinguished from the type strain of S. porcorum by several biochemical characteristics, such as the production of glycyl-tryptophan arylamidase and α-chymotrypsin, and the non-acidification of different sugars. Based on both phenotypic and phylogenetic findings, it is proposed that the unknown bacterium be assigned to a novel species of the genus Streptococcus, and named Streptococcus pharyngis sp. nov. The type strain is DICM10-00796B(T) ( = CECT 8754(T) = CCUG 66496(T)).

  5. Complete genome sequence of a coxsackievirus B3 recombinant isolated from an aseptic meningitis outbreak in eastern China.

    PubMed

    Zhang, Wenqiang; Lin, Xiaojuan; Jiang, Ping; Tao, Zexin; Liu, Xiaolin; Ji, Feng; Wang, Tongzhan; Wang, Suting; Lv, Hui; Xu, Aiqiang; Wang, Haiyan

    2016-08-01

    Coxsackievirus B3 (CV-B3) has frequently been associated with aseptic meningitis outbreaks in China. To identify sequence motifs related to aseptic meningitis and to construct an infectious clone, the genome sequence of 08TC170, a representative strain isolated from cerebrospinal fluid (CSF) samples from an outbreak in Shandong in 2008, was determined, and the coding regions for P1-P3 and VP1 were aligned. The first 21 and last 20 residues were "TTAAAACAGCCTGTGGGTTGT" and "ATTCTCCGCATTCGGTGCGG", respectively. The whole genome consisted of 7401 nucleotides, sharing 80.8 % identity with the prototype strain Nancy and low sequence similarity with members of clusters A-C. In contrast, 08TC170 showed high sequence similarity to members of cluster D. An especially high level of sequence identity (≥97.7 %) was found within a branch constituted by 08TC170 and four Chinese strains that clustered together in all of the P1-P3 phylogenic trees. In addition, 08TC170 also possessed a close relationship to the Hong Kong strain 26362/08 in VP1. Similarity plot analysis showed that 08TC170 was most similar to the Chinese CV-B3 strain SSM in P1 and the partial P2 coding region but to the CV-B5 or E-6 strain in 2C and following regions. A T277A mutation was found in 08TC170 and other strains isolated in 2008-2010, but not in strains isolated before 2008, which had high sequence similarity and formed the cluster A277. The results suggested that 08TC170 was the product of both intertypic recombination and point mutation, whose effects on viral neurovirulence will be investigated in a further study. The high homology between 08TC170 and other strains revealed their co-circulation in mainland China and Hong Kong and indicates that further surveillance is needed.

  6. Comparative analysis of the feline immunoglobulin repertoire.

    PubMed

    Steiniger, Sebastian C J; Glanville, Jacob; Harris, Douglas W; Wilson, Thomas L; Ippolito, Gregory C; Dunham, Steven A

    2017-03-01

    Next-Generation Sequencing combined with bioinformatics is a powerful tool for analyzing the large number of DNA sequences present in the expressed antibody repertoire and these data sets can be used to advance a number of research areas including antibody discovery and engineering. The accurate measurement of the immune repertoire sequence composition, diversity and abundance is important for understanding the repertoire response in infections, vaccinations and cancer immunology and could also be useful for elucidating novel molecular targets. In this study 4 individual domestic cats (Felis catus) were subjected to antibody repertoire sequencing with total number of sequences generated 1079863 for VH for IgG, 1050824 VH for IgM, 569518 for VK and 450195 for VL. Our analysis suggests that a similar VDJ expression patterns exists across all cats. Similar to the canine repertoire, the feline repertoire is dominated by a single subgroup, namely VH3. The antibody paratope of felines showed similar amino acid variation when compared to human, mouse and canine counterparts. All animals show a similarly skewed VH CDR-H3 profile and, when compared to canine, human and mouse, distinct differences are observed. Our study represents the first attempt to characterize sequence diversity in the expressed feline antibody repertoire and this demonstrates the utility of using NGS to elucidate entire antibody repertoires from individual animals. These data provide significant insight into understanding the feline immune system function. Copyright © 2017 International Alliance for Biological Standardization. Published by Elsevier Ltd. All rights reserved.

  7. Sequence similarities and evolutionary relationships of microbial, plant and animal alpha-amylases.

    PubMed

    Janecek, S

    1994-09-01

    Amino acid sequence comparison of 37 alpha-amylases from microbial, plant and animal sources was performed to identify their mutual sequence similarities in addition to the five already described conserved regions. These sequence regions were examined from structure/function and evolutionary perspectives. An unrooted evolutionary tree of alpha-amylases was constructed on a subset of 55 residues from the alignment of sequence similarities along with conserved regions. The most important new information extracted from the tree was as follows: (a) the close evolutionary relationship of Alteromonas haloplanctis alpha-amylase (thermolabile enzyme from an antarctic psychrotroph) with the already known group of homologous alpha-amylases from streptomycetes, Thermomonospora curvata, insects and mammals, and (b) the remarkable 40.1% identity between starch-saccharifying Bacillus subtilis alpha-amylase and the enzyme from the ruminal bacterium Butyrivibrio fibrisolvens, an alpha-amylase with an unusually large polypeptide chain (943 residues in the mature enzyme). Due to a very high degree of similarity, the whole amino acid sequences of three groups of alpha-amylases, namely (a) fungi and yeasts, (b) plants, and (c) A. haloplanctis, streptomycetes, T. curvata, insects and mammals, were aligned independently and their unrooted distance trees were calculated using these alignments. Possible rooting of the trees was also discussed. Based on the knowledge of the location of the five disulfide bonds in the structure of pig pancreatic alpha-amylase, the possible disulfide bridges were established for each of these groups of homologous alpha-amylases.

  8. Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

    PubMed

    Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

    2007-01-01

    The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).

  9. Gene and translation initiation site prediction in metagenomic sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less

  10. Querying Event Sequences by Exact Match or Similarity Search: Design and Empirical Evaluation

    PubMed Central

    Wongsuphasawat, Krist; Plaisant, Catherine; Taieb-Maimon, Meirav; Shneiderman, Ben

    2012-01-01

    Specifying event sequence queries is challenging even for skilled computer professionals familiar with SQL. Most graphical user interfaces for database search use an exact match approach, which is often effective, but near misses may also be of interest. We describe a new similarity search interface, in which users specify a query by simply placing events on a blank timeline and retrieve a similarity-ranked list of results. Behind this user interface is a new similarity measure for event sequences which the users can customize by four decision criteria, enabling them to adjust the impact of missing, extra, or swapped events or the impact of time shifts. We describe a use case with Electronic Health Records based on our ongoing collaboration with hospital physicians. A controlled experiment with 18 participants compared exact match and similarity search interfaces. We report on the advantages and disadvantages of each interface and suggest a hybrid interface combining the best of both. PMID:22379286

  11. Automated two-point dixon screening for the evaluation of hepatic steatosis and siderosis: comparison with R2-relaxometry and chemical shift-based sequences.

    PubMed

    Henninger, B; Zoller, H; Rauch, S; Schocke, M; Kannengiesser, S; Zhong, X; Reiter, G; Jaschke, W; Kremser, C

    2015-05-01

    To evaluate the automated two-point Dixon screening sequence for the detection and estimated quantification of hepatic iron and fat compared with standard sequences as a reference. One hundred and two patients with suspected diffuse liver disease were included in this prospective study. The following MRI protocol was used: 3D-T1-weighted opposed- and in-phase gradient echo with two-point Dixon reconstruction and dual-ratio signal discrimination algorithm ("screening" sequence); fat-saturated, multi-gradient-echo sequence with 12 echoes; gradient-echo T1 FLASH opposed- and in-phase. Bland-Altman plots were generated and correlation coefficients were calculated to compare the sequences. The screening sequence diagnosed fat in 33, iron in 35 and a combination of both in 4 patients. Correlation between R2* values of the screening sequence and the standard relaxometry was excellent (r = 0.988). A slightly lower correlation (r = 0.978) was found between the fat fraction of the screening sequence and the standard sequence. Bland-Altman revealed systematically lower R2* values obtained from the screening sequence and higher fat fraction values obtained with the standard sequence with a rather high variability in agreement. The screening sequence is a promising method with fast diagnosis of the predominant liver disease. It is capable of estimating the amount of hepatic fat and iron comparable to standard methods. • MRI plays a major role in the clarification of diffuse liver disease. • The screening sequence was introduced for the assessment of diffuse liver disease. • It is a fast and automated algorithm for the evaluation of hepatic iron and fat. • It is capable of estimating the amount of hepatic fat and iron.

  12. Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

    PubMed

    Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

    2017-01-01

    In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.

  13. Molecular evidence for piroplasms in wild Reeves' muntjac (Muntiacus reevesi) in China.

    PubMed

    Yang, Ji-fei; Li, You-quan; Liu, Zhi-jie; Liu, Jun-long; Guan, Gui-quan; Chen, Ze; Luo, Jian-xun; Wang, Xiao-long; Yin, Hong

    2014-10-01

    DNA from liver samples of 17 free-ranging wild Reeves' muntjac (Muntiacus reevesi) was used for PCR amplification of piropalsm 18S rRNA gene. Of 17 samples, 14 (82.4%) showed a specific PCR product which were cloned and sequenced. BLAST analysis of the sequences obtained showed similarities to Babesia sp., Theileria capreoli, Theileria uilenbergi and Theileria sp. BO302-SE. Phylogenetic analysis showed that the Babesia sp. detected in the present study was distantly separated from known Babesia species of wild and domestic animals. Six sequences showed 100% similarity to T. capreoli while five sequences were separated from all known Theileria species and constituted an independent clade with Theileria sp. BO302-SE derived from roe deer in Italy; two sequences were close to T. uilenbergi with 97% similarity. This is the first description of hemoparasite infection in free-ranging wild Reeves' muntjac in China. Our results indicate that wild Reeves' muntjac may play an important reservoir role for hemoparasites. Crown Copyright © 2014. Published by Elsevier Ireland Ltd. All rights reserved.

  14. The Carrancas Formation, Bambuí Group: A record of pre-Marinoan sedimentation on the southern São Francisco craton, Brazil

    NASA Astrophysics Data System (ADS)

    Uhlein, Gabriel J.; Uhlein, Alexandre; Halverson, Galen P.; Stevenson, Ross; Caxito, Fabrício A.; Cox, Grant M.; Carvalho, Jorge F. M. G.

    2016-11-01

    The Carrancas Formation outcrops in east-central Brazil on the southern margin of the São Francisco craton where it comprises the base of the late Neoproterozoic Bambuí Group. It is overlain by the basal Ediacaran cap carbonate Sete Lagoas Formation and was for a long time considered to be glacially influenced and correlative with the glaciogenic Jequitaí Formation. New stratigraphic, isotopic and geochronologic data imply that the Carrancas Formation was instead formed by the shedding of debris from basement highs uplifted during an episode of minor continental rifting. Reddish dolostones in the upper Carrancas Formation have δ13C values ranging from +7.1 to +9.6‰, which is a unique C isotopic composition for the lowermost Bambuí Group but similar to values found in the Tijucuçu sequence, a pre-glacial unit in the Araçuaí fold belt on the eastern margin of the São Francisco craton. The stratigraphic position below basal Ediacaran cap carbonates and the highly positive δ13C values together indicate a Cryogenian interglacial age for the Carrancas Formation, with the high δ13C values representing the so-called Keele peak, which precedes the pre-Marinoan Trezona negative δ13C excursion in other well characterized Cryogenian sequences. Hence, The Carrancas Formation pre-dates de Marinoan Jequitaí Formation and represents an interval of Cryogenian stratigraphy not previously known to occur on the southern margin of São Francicso craton. Documentation of Cryogenian interglacial strata on the São Francisco craton reinforces recent revisions to the age of Bambuí Group strata and has implications for the development of the Bambuí basin.

  15. First results of the SONS survey: submillimetre detections of debris discs

    NASA Astrophysics Data System (ADS)

    Panić, O.; Holland, W. S.; Wyatt, M. C.; Kennedy, G. M.; Matthews, B. C.; Lestrade, J. F.; Sibthorpe, B.; Greaves, J. S.; Marshall, J. P.; Phillips, N. M.; Tottle, J.

    2013-10-01

    New detections of debris discs at submillimetre wavelengths present highly valuable complementary information to prior observations of these sources at shorter wavelengths. Characterization of discs through spectral energy distribution modelling including the submillimetre fluxes is essential for our basic understanding of disc mass and temperature, and presents a starting point for further studies using millimetre interferometric observations. In the framework of the ongoing SCUBA-2 Observations of Nearby Stars, the instrument SCUBA-2 on the James Clerk Maxwell Telescope was used to provide measurements of 450 and 850 μm fluxes towards a large sample of nearby main-sequence stars with debris discs detected previously at shorter wavelengths. We present the first results from the ongoing survey, concerning 850 μm detections and 450 μm upper limits towards 10 stars, the majority of which are detected at submillimetre wavelengths for the first time. One, or possibly two, of these new detections is likely a background source. We fit the spectral energy distributions of the star+disc systems with a blackbody emission approach and derive characteristic disc temperatures. We use these temperatures to convert the observed fluxes to disc masses. We obtain a range of disc masses from 0.001 to 0.1 M⊕, values similar to the prior dust mass measurements towards debris discs. There is no evidence for evolution in dust mass with age on the main sequence, and indeed the upper envelope remains relatively flat at ≈0.5 M⊕ at all ages. The inferred disc masses are lower than those from disc detections around pre-main-sequence stars, which may indicate a depletion of solid mass. This may also be due to a change in disc opacity, though limited sensitivity means that it is not yet known what fraction of pre-main-sequence stars have discs with dust masses similar to debris disc levels. New, high-sensitivity detections are a path towards investigating the trends in dust mass evolution.

  16. Muricauda antarctica sp. nov., a marine member of the Flavobacteriaceae isolated from Antarctic seawater.

    PubMed

    Wu, Yue-Hong; Yu, Pei-Song; Zhou, Ya-Dong; Xu, Lin; Wang, Chun-Sheng; Wu, Min; Oren, Aharon; Xu, Xue-Wei

    2013-09-01

    A Gram-stain-negative, rod-shaped bacterium with appendages, designated Ar-22(T), was isolated from a seawater sample collected from the western part of Prydz Bay, near Cape Darnley, Antarctica. Strain Ar-22(T) grew optimally at 35 °C, at pH 7.5 and in the presence of 1-3% (w/v) NaCl. The isolate was positive for casein, gelatin and Tween 20 decomposition and negative for H2S production and indole formation. Chemotaxonomic analysis showed that MK-6 was the major isoprenoid quinone and phosphatidylethanolamine was the major polar lipid. The major fatty acids were iso-C(17:0) 3-OH, iso-C(15:1) G, iso-C(15:0) and C(16:1)ω7c/iso-C(15:0) 2OH. The genomic DNA G+C content was 44.8 mol%. Comparative 16S rRNA gene sequence analysis revealed that strain Ar-22(T) is closely related to members of the genus Muricauda, sharing 94.2-97.3% sequence similarity with the type strains of species of the genus Muricauda and being most closely related to the Muricauda aquimarina. Phylogenetic analysis based on the 16S rRNA gene sequence comparison confirmed that strain Ar-22(T) formed a deep lineage with Muricauda flavescens. Sequence similarity between strain Ar-22(T) and Muricauda ruestringensis DSM 13258(T), the type species of the genus Muricauda, was 96.9%. Strain Ar-22(T) exhibited mean DNA-DNA relatedness values of 40.1%, 49.4% and 25.7% to M. aquimarina JCM 11811(T), M. flavescens JCM 11812(T) and Muricauda lutimaris KCTC 22173(T), respectively. On the basis of phenotypic and genotypic data, strain Ar-22(T) represents a novel species of the genus Muricauda, for which the name Muricauda antarctica sp. nov. (type strain Ar-22(T) =CGMCC 1.12174(T) = JCM 18450(T)) is proposed.

  17. Paenibacillus phoenicis sp. nov., isolated from the Phoenix Lander assembly facility and a subsurface molybdenum mine.

    PubMed

    Benardini, James N; Vaishampayan, Parag A; Schwendner, Petra; Swanner, Elizabeth; Fukui, Youhei; Osman, Sharif; Satomi, Masakata; Venkateswaran, Kasthuri

    2011-06-01

    A novel Gram-positive, motile, endospore-forming, aerobic bacterium was isolated from the NASA Phoenix Lander assembly clean room that exhibits 100 % 16S rRNA gene sequence similarity to two strains isolated from a deep subsurface environment. All strains are rod-shaped, endospore-forming bacteria, whose endospores are resistant to UV radiation up to 500 J m(-2). A polyphasic taxonomic study including traditional phenotypic tests, fatty acid analysis, 16S rRNA gene sequencing and DNA-DNA hybridization analysis was performed to characterize these novel strains. The 16S rRNA gene sequencing convincingly grouped these novel strains within the genus Paenibacillus as a separate cluster from previously described species. The similarity of 16S rRNA gene sequences among the novel strains was identical but only 98.1 to 98.5 % with their nearest neighbours Paenibacillus barengoltzii ATCC BAA-1209(T) and Paenibacillus timonensis CIP 108005(T). The menaquinone MK-7 was dominant in these novel strains as shown in other species of the genus Paenibacillus. The DNA-DNA hybridization dissociation value was <45 % with the closest related species. The novel strains had DNA G+C contents of 51.9 to 52.8 mol%. Phenotypically, the novel strains can be readily differentiated from closely related species by the absence of urease and gelatinase and the production of acids from a variety of sugars including l-arabinose. The major fatty acid was anteiso-C(15 : 0) as seen in P. barengoltzii and P. timonensis whereas the proportion of C(16 : 0) was significantly different from the closely related species. Based on phylogenetic and phenotypic results, it was concluded that these strains represent a novel species of the genus Paenibacillus, for which the name Paenibacillus phoenicis sp. nov. is proposed. The type strain is 3PO2SA(T) ( = NRRL B-59348(T)  = NBRC 106274(T)).

  18. Free-breathing echo-planar imaging based diffusion-weighted magnetic resonance imaging of the liver with prospective acquisition correction.

    PubMed

    Asbach, Patrick; Hein, Patrick A; Stemmer, Alto; Wagner, Moritz; Huppertz, Alexander; Hamm, Bernd; Taupitz, Matthias; Klessen, Christian

    2008-01-01

    To evaluate soft tissue contrast and image quality of a respiratory-triggered echo-planar imaging based diffusion-weighted sequence (EPI-DWI) with different b values for magnetic resonance imaging (MRI) of the liver. Forty patients were examined. Quantitative and qualitative evaluation of contrast was performed. Severity of artifacts and overall image quality in comparison with a T2w turbo spin-echo (T2-TSE) sequence were scored. The liver-spleen contrast was significantly higher (P < 0.05) for the EPI-DWI compared with the T2-TSE sequence (0.47 +/- 0.11 (b50); 0.48 +/- 0.13 (b300); 0.47 +/- 0.13 (b600) vs 0.38 +/- 0.11). Liver-lesion contrast strongly depends on the b value of the DWI sequence and decreased with higher b values (b50, 0.47 +/- 0.19; b300, 0.40 +/- 0.20; b600, 0.28 +/- 0.23). Severity of artifacts and overall image quality were comparable to the T2-TSE sequence when using a low b value (P > 0.05), artifacts increased and image quality decreased with higher b values (P < 0.05). Respiratory-triggered EPI-DWI of the liver is feasible because good image quality and favorable soft tissue contrast can be achieved.

  19. What is a melody? On the relationship between pitch and brightness of timbre

    PubMed Central

    Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel

    2014-01-01

    Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners’ task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities. PMID:24478638

  20. Species clarification of the culinary Bachu mushroom in western China.

    PubMed

    Zhao, Qi; Sulayman, Mamtimin; Zhu, Xue-Tai; Zhao, Yong-Chang; Yang, Zhu-Liang; Hyde, Kevin D

    2016-01-01

    The Bachu mushroom, previously identified as Helvella leucopus, is characterized by a saddle-shaped, to irregularly lobed pileus, with a gray, brown to blackish hymenium and a whitish to pale receptacle surface and white, terete stipe with enlarged basal grooves. It has high economic value, mostly as a dietary supplement in western China, and its medicinal functions have raised broad interest. In the present paper species of the Bachu mushroom in Xinjiang Autonomous Region, western China were investigated with morphology and DNA sequence data. Phylogenetic analyses inferred from ITS, 28S and TEF1 sequence data strongly supported lineages corresponding to morphological features. The Bachu mushroom, which differs from the European Helvella leucopus, comprises two distinct new species, namely Helvella bachu and Helvella subspadicea. In this paper we introduce the new species with descriptions and figures and compare them with similar taxa. The European Helvella spadicea is also re-examined, described and illustrated. © 2016 by The Mycological Society of America.

Top