factors sequence analyses: Topics by Science.gov

Sample records for factors sequence analyses

Factor structure of paediatric timed motor examination and its relationship with IQ

PubMed Central

MARTIN, REBECCA; TIGERA, CASSIE; DENCKLA, MARTHA B; MAHONE, E MARK

2012-01-01

AIM Brain systems supporting higher cognitive and motor control develop in a parallel manner, dependent on functional integrity and maturation of related regions, suggesting neighbouring neural circuitry. Concurrent examination of motor and cognitive control can provide a window into neurological development. However, identification of performance-based measures that do not correlate with IQ has been a challenge. METHOD Timed motor performance from the Physical and Neurological Examination of Subtle Signs and IQ were analysed in 136 children aged 6 to 16 (mean age 10y 2.6mo, SD 2y 6.4mo; 98 female, 38male) attending an outpatient neuropsychology clinic and 136 right-handed comparison individuals aged 6 to 16 (mean age 10y 3.1mo, SD 2y 6.1mo; 98 female, 38male). Timed activities – three repetitive movements (toe tapping, hand patting, finger tapping) and three sequenced movements (heel–toe tap, hand pronate/supinate, finger sequencing) each performed on the right and left – were included in exploratory factor analyses. RESULTS Among comparison individuals, factor analysis yielded two factors – repetitive and sequenced movements – with the sequenced factor significantly predictive of Verbal IQ (VIQ) (ΔR2=0.018, p=0.019), but not the repetitive factor (ΔR2=0.004, p=0.39). Factor analysis within the clinical group yielded two similar factors (repetitive and sequenced), both significantly predictive of VIQ, (ΔR2=0.028, p=0.015; ΔR2=0.046, p=0.002 respectively). INTERPRETATION Among typical children, repetitive timed tasks may be independent of IQ; however, sequenced tasks share more variance, implying shared neural substrates. Among neurologically vulnerable populations, however, both sequenced and repetitive movements covary with IQ, suggesting that repetitive speed is more indicative of underlying neurological integrity. PMID:20412260
Insights into the phylogeny of Northern Hemisphere Armillaria: Neighbor-net and Bayesian analyses of translation elongation factor 1-α gene sequences

Treesearch

Ned B. Klopfenstein; Jane E. Stewart; Yuko Ota; John W. Hanna; Bryce A. Richardson; Amy L. Ross-Davis; Ruben D. Elias-Roman; Kari Korhonen; Nenad Keca; Eugenia Iturritxa; Dionicio Alvarado-Rosales; Halvor Solheim; Nicholas J. Brazee; Piotr Lakomy; Michelle R. Cleary; Eri Hasegawa; Taisei Kikuchi; Fortunato Garza-Ocanas; Panaghiotis Tsopelas; Daniel Rigling; Simone Prospero; Tetyana Tsykun; Jean A. Berube; Franck O. P. Stefani; Saeideh Jafarpour; Vladimir Antonin; Michal Tomsovsky; Geral I. McDonald; Stephen Woodward; Mee-Sook Kim

2017-01-01

Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequenceâbased analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation...
The phylogenetic position of an Armillaria species from Amami-Oshima, a subtropical island of Japan, based on elongation factor and ITS sequences

Treesearch

Yuko Ota; Mee-Sook Kim; Hitoshi Neda; Ned B. Klopfenstein; Eri Hasegawa

2011-01-01

An undetermined Armillaria species was collected on Amami-Oshima, a subtropical island of Japan. The phylogenetic position of the Armillaria sp. was determined using sequences of the elongation factor-1a (EF-1a) gene and the internal transcribed spacer (ITS) region (ITS1-5.8S-ITS2) of ribosomal DNA (rDNA). The phylogenetic analyses based on EF-1a and ITS sequences...
Chloroplast Phylogenomics Indicates that Ginkgo biloba Is Sister to Cycads

PubMed Central

Wu, Chung-Shien; Chaw, Shu-Miaw; Huang, Ya-Yi

2013-01-01

Molecular phylogenetic studies have not yet reached a consensus on the placement of Ginkgoales, which is represented by the only living species, Ginkgo biloba (common name: ginkgo). At least six discrepant placements of ginkgo have been proposed. This study aimed to use the chloroplast phylogenomic approach to examine possible factors that lead to such disagreeing placements. We found the sequence types used in the analyses as the most critical factor in the conflicting placements of ginkgo. In addition, the placement of ginkgo varied in the trees inferred from nucleotide (NU) sequences, which notably depended on breadth of taxon sampling, tree-building methods, codon positions, positions of Gnetopsida (common name: gnetophytes), and including or excluding gnetophytes in data sets. In contrast, the trees inferred from amino acid (AA) sequences congruently supported the monophyly of a ginkgo and Cycadales (common name: cycads) clade, regardless of which factors were examined. Our site-stripping analysis further revealed that the high substitution saturation of NU sequences mainly derived from the third codon positions and contributed to the variable placements of ginkgo. In summary, the factors we surveyed did not affect results inferred from analyses of AA sequences. Congruent topologies in our AA trees give more confidence in supporting the ginkgo–cycad sister-group hypothesis. PMID:23315384
Genome Sequences for Five Strains of the Emerging Pathogen Haemophilus haemolyticus

PubMed Central

Jordan, I. King; Conley, Andrew B.; Antonov, Ivan V.; Arthur, Robert A.; Cook, Erin D.; Cooper, Guy P.; Jones, Bernard L.; Knipe, Kristen M.; Lee, Kevin J.; Liu, Xing; Mitchell, Gabriel J.; Pande, Pushkar R.; Petit, Robert A.; Qin, Shaopu; Rajan, Vani N.; Sarda, Shruti; Sebastian, Aswathy; Tang, Shiyuyun; Thapliyal, Racchit; Varghese, Neha J.; Ye, Tianjun; Katz, Lee S.; Wang, Xin; Rowe, Lori; Frace, Michael; Mayer, Leonard W.

2011-01-01

We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors. PMID:21952546
Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.

PubMed

Lun, Aaron T L; Bach, Karsten; Marioni, John C

2016-04-27

Normalization of single-cell RNA sequencing data is necessary to eliminate cell-specific biases prior to downstream analyses. However, this is not straightforward for noisy single-cell data where many counts are zero. We present a novel approach where expression values are summed across pools of cells, and the summed values are used for normalization. Pool-based size factors are then deconvolved to yield cell-based factors. Our deconvolution approach outperforms existing methods for accurate normalization of cell-specific biases in simulated data. Similar behavior is observed in real data, where deconvolution improves the relevance of results of downstream analyses.
Cis-acting elements in the promoter region of the human aldolase C gene.

PubMed

Buono, P; de Conciliis, L; Olivetta, E; Izzo, P; Salvatore, F

1993-08-16

We investigated the cis-acting sequences involved in the expression of the human aldolase C gene by transient transfections into human neuroblastoma cells (SKNBE). We demonstrate that 420 bp of the 5'-flanking DNA direct at high efficiency the transcription of the CAT reporter gene. A deletion between -420 bp and -164 bp causes a 60% decrease of CAT activity. Gel shift and DNase I footprinting analyses revealed four protected elements: A, B, C and D. Competition analyses indicate that Sp1 or factors sharing a similar sequence specificity bind to elements A and B, but not to elements C and D. Sequence analysis shows a half palindromic ERE motif (GGTCA), in elements B and D. Region D binds a transactivating factor which appears also essential to stabilize the initiation complex.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.

PubMed

Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H

2013-12-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family

PubMed Central

Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.

2013-01-01

Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

PubMed Central

Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

2016-01-01

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543
Residual Stresses and Critical Initial Flaw Size Analyses of Welds

NASA Technical Reports Server (NTRS)

Brust, Frederick W.; Raju, Ivatury, S.; Dawocke, David S.; Cheston, Derrick

2009-01-01

An independent assessment was conducted to determine the critical initial flaw size (CIFS) for the flange-to-skin weld in the Ares I-X Upper Stage Simulator (USS). A series of weld analyses are performed to determine the residual stresses in a critical region of the USS. Weld residual stresses both increase constraint and mean stress thereby having an important effect on the fatigue life. The purpose of the weld analyses was to model the weld process using a variety of sequences to determine the 'best' sequence in terms of weld residual stresses and distortions. The many factors examined in this study include weld design (single-V, double-V groove), weld sequence, boundary conditions, and material properties, among others. The results of this weld analysis are included with service loads to perform a fatigue and critical initial flaw size evaluation.
Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

PubMed

Oliani, L C; Lidani, K C F; Gabriel, J E

2015-10-16

MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.
Design criteria monograph for valve assemblies

NASA Technical Reports Server (NTRS)

1974-01-01

Monograph is limited to valve selection factors for trade-off studies, configuration analyses, actuator selection, and integration of components. Material is organized along lines of valve design sequence.
Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

PubMed

He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

2015-01-01

The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
A modified operational sequence methodology for zoo exhibit design and renovation: conceptualizing animals, staff, and visitors as interdependent coworkers.

PubMed

Kelling, Nicholas J; Gaalema, Diann E; Kelling, Angela S

2014-01-01

Human factors analyses have been used to improve efficiency and safety in various work environments. Although generally limited to humans, the universality of these analyses allows for their formal application to a much broader domain. This paper outlines a model for the use of human factors to enhance zoo exhibits and optimize spaces for all user groups; zoo animals, zoo visitors, and zoo staff members. Zoo exhibits are multi-faceted and each user group has a distinct set of requirements that can clash or complement each other. Careful analysis and a reframing of the three groups as interdependent coworkers can enhance safety, efficiency, and experience for all user groups. This paper details a general creation and specific examples of the use of the modified human factors tools of function allocation, operational sequence diagram and needs assessment. These tools allow for adaptability and ease of understanding in the design or renovation of exhibits. © 2014 Wiley Periodicals, Inc.
Why Choose This One? Factors in Scientists' Selection of Bioinformatics Tools

ERIC Educational Resources Information Center

Bartlett, Joan C.; Ishimura, Yusuke; Kloda, Lorie A.

2011-01-01

Purpose: The objective was to identify and understand the factors involved in scientists' selection of preferred bioinformatics tools, such as databases of gene or protein sequence information (e.g., GenBank) or programs that manipulate and analyse biological data (e.g., BLAST). Methods: Eight scientists maintained research diaries for a two-week…
Sequence and functional characterization of hypoxia-inducible factors, HIF1α, HIF2αa, and HIF3α, from the estuarine fish, Fundulus heteroclitus

PubMed Central

Townley, Ian K.; Karchner, Sibel I.; Skripnikova, Elena; Wiese, Thomas E.; Hahn, Mark E.

2017-01-01

The hypoxia-inducible factor (HIF) family of transcription factors plays central roles in the development, physiology, pathology, and environmental adaptation of animals. Because many aquatic habitats are characterized by episodes of low dissolved oxygen, fish represent ideal models to study the roles of HIF in the response to aquatic hypoxia. The estuarine fish Fundulus heteroclitus is found in habitats prone to hypoxia. It responds to low oxygen via behavioral, physiological, and molecular changes, and one member of the HIF family, HIF2α, has been previously described. Herein, cDNA sequencing, phylogenetic analyses, and genomic approaches were used to determine other members of the HIFα family from F. heteroclitus and their relationships to HIFα subunits from other vertebrates. In vitro and cellular approaches demonstrated that full-length forms of HIF1α, HIF2α, and HIF3α independently formed complexes with the β-subunit, aryl hydrocarbon receptor nuclear translocator, to bind to hypoxia response elements and activate reporter gene expression. Quantitative PCR showed that HIFα mRNA abundance varied among organs of normoxic fish in an isoform-specific fashion. Analysis of the F. heteroclitus genome revealed a locus encoding a second HIF2α—HIF2αb—a predicted protein lacking oxygen sensing and transactivation domains. Finally, sequence analyses demonstrated polymorphism in the coding sequence of each F. heteroclitus HIFα subunit, suggesting that genetic variation in these transcription factors may play a role in the variation in hypoxia responses among individuals or populations. PMID:28039194
Sequence and functional characterization of hypoxia-inducible factors, HIF1α, HIF2αa, and HIF3α, from the estuarine fish, Fundulus heteroclitus.

PubMed

Townley, Ian K; Karchner, Sibel I; Skripnikova, Elena; Wiese, Thomas E; Hahn, Mark E; Rees, Bernard B

2017-03-01

The hypoxia-inducible factor (HIF) family of transcription factors plays central roles in the development, physiology, pathology, and environmental adaptation of animals. Because many aquatic habitats are characterized by episodes of low dissolved oxygen, fish represent ideal models to study the roles of HIF in the response to aquatic hypoxia. The estuarine fish Fundulus heteroclitus is found in habitats prone to hypoxia. It responds to low oxygen via behavioral, physiological, and molecular changes, and one member of the HIF family, HIF2α, has been previously described. Herein, cDNA sequencing, phylogenetic analyses, and genomic approaches were used to determine other members of the HIFα family from F. heteroclitus and their relationships to HIFα subunits from other vertebrates. In vitro and cellular approaches demonstrated that full-length forms of HIF1α, HIF2α, and HIF3α independently formed complexes with the β-subunit, aryl hydrocarbon receptor nuclear translocator, to bind to hypoxia response elements and activate reporter gene expression. Quantitative PCR showed that HIFα mRNA abundance varied among organs of normoxic fish in an isoform-specific fashion. Analysis of the F. heteroclitus genome revealed a locus encoding a second HIF2α-HIF2αb-a predicted protein lacking oxygen sensing and transactivation domains. Finally, sequence analyses demonstrated polymorphism in the coding sequence of each F. heteroclitus HIFα subunit, suggesting that genetic variation in these transcription factors may play a role in the variation in hypoxia responses among individuals or populations. Copyright © 2017 the American Physiological Society.
Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

PubMed Central

Krishnan, Neeraja M.

2017-01-01

Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357
Complete cpDNA genome sequence of Smilax china and phylogenetic placement of Liliales--influences of gene partitions and taxon sampling.

PubMed

Liu, Juan; Qi, Zhe-Chen; Zhao, Yun-Peng; Fu, Cheng-Xin; Jenny Xiang, Qiu-Yun

2012-09-01

The complete nucleotide sequence of the chloroplast genome (cpDNA) of Smilax china L. (Smilacaceae) is reported. It is the first complete cp genome sequence in Liliales. Genomic analyses were conducted to examine the rate and pattern of cpDNA genome evolution in Smilax relative to other major lineages of monocots. The cpDNA genomic sequences were combined with those available for Lilium to evaluate the phylogenetic position of Liliales and to investigate the influence of taxon sampling, gene sampling, gene function, natural selection, and substitution rate on phylogenetic inference in monocots. Phylogenetic analyses using sequence data of gene groups partitioned according to gene function, selection force, and total substitution rate demonstrated evident impacts of these factors on phylogenetic inference of monocots and the placement of Liliales, suggesting potential evolutionary convergence or adaptation of some cpDNA genes in monocots. Our study also demonstrated that reduced taxon sampling reduced the bootstrap support for the placement of Liliales in the cpDNA phylogenomic analysis. Analyses of sequences of 77 protein genes with some missing data and sequences of 81 genes (all protein genes plus the rRNA genes) support a sister relationship of Liliales to the commelinids-Asparagales clade, consistent with the APG III system. Analyses of 63 cpDNA protein genes for 32 taxa with few missing data, however, support a sister relationship of Liliales (represented by Smilax and Lilium) to Dioscoreales-Pandanales. Topology tests indicated that these two alignments do not significantly differ given any of these three cpDNA genomic sequence data sets. Furthermore, we found no saturation effect of the data, suggesting that the cpDNA genomic sequence data used in the study are appropriate for monocot phylogenetic study and long-branch attraction is unlikely to be the cause to explain the result of two well-supported, conflict placements of Liliales. Further analyses using sufficient nuclear data remain necessary to evaluate these two phylogenetic hypotheses regarding the position of Liliales and to address the causes of signal conflict among genes and partitions. Copyright © 2012 Elsevier Inc. All rights reserved.

Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima

PubMed Central

Yin, Yimeng; Das, Pratyush K; Jolma, Arttu; Zhu, Fangjie; Popov, Alexander; Xu, You; Nilsson, Lennart

2018-01-01

Most transcription factors (TFs) can bind to a population of sequences closely related to a single optimal site. However, some TFs can bind to two distinct sequences that represent two local optima in the Gibbs free energy of binding (ΔG). To determine the molecular mechanism behind this effect, we solved the structures of human HOXB13 and CDX2 bound to their two optimal DNA sequences, CAATAAA and TCGTAAA. Thermodynamic analyses by isothermal titration calorimetry revealed that both sites were bound with similar ΔG. However, the interaction with the CAA sequence was driven by change in enthalpy (ΔH), whereas the TCG site was bound with similar affinity due to smaller loss of entropy (ΔS). This thermodynamic mechanism that leads to at least two local optima likely affects many macromolecular interactions, as ΔG depends on two partially independent variables ΔH and ΔS according to the central equation of thermodynamics, ΔG = ΔH - TΔS. PMID:29638214
New encoded single-indicator sequences based on physico-chemical parameters for efficient exon identification.

PubMed

Meher, J K; Meher, P K; Dash, G N; Raval, M K

2012-01-01

The first step in gene identification problem based on genomic signal processing is to convert character strings into numerical sequences. These numerical sequences are then analysed spectrally or using digital filtering techniques for the period-3 peaks, which are present in exons (coding areas) and absent in introns (non-coding areas). In this paper, we have shown that single-indicator sequences can be generated by encoding schemes based on physico-chemical properties. Two new methods are proposed for generating single-indicator sequences based on hydration energy and dipole moments. The proposed methods produce high peak at exon locations and effectively suppress false exons (intron regions having greater peak than exon regions) resulting in high discriminating factor, sensitivity and specificity.
Sequences of emotional distress expressed by clients and acknowledged by therapists: are they associated more with some therapists than others?

PubMed

Viney, L L

1994-11-01

When clients come to psychotherapy they are distressed, this distress usually being expressed in the form of anxiety, hostility, depression and helplessness. This study explored the sequences of emotional distress expressed by clients and acknowledged by therapists, and examined their associations with other factors. The transcripts of five therapists (two single sessions each) were content-analysed: they used personal construct, client centered, rational-emotive, Gestalt and transactional analysis therapy. Log-linear analyses of appropriate contingency table cell frequencies were conducted to test associations between identified sequences and the two variables of therapist and timing of completion of the sequence. Therapist-client sequences of Anxiety-Anxiety, Anxiety-Hostility and Helplessness-Hostility were found to be associated more with the personal construct and client centred therapists than with the rational-emotive therapist. Client-therapist sequences of Anxiety-Anxiety, Helplessness-Anxiety and Helplessness-Helplessness were more often found with the client centred therapist than the other therapists. For most of these sequences timing had an effect, yet timing rarely interacted with the therapist variable. The findings are discussed in terms of their relevance to the theoretical positions represented, the shortcomings of the research and the value of this methodology in studies linking therapy process with outcome.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency.

PubMed

Breitfeld, Jana; Martens, Susanne; Klammt, Jürgen; Schlicke, Marina; Pfäffle, Roland; Krause, Kerstin; Weidle, Kerstin; Schleinitz, Dorit; Stumvoll, Michael; Führer, Dagmar; Kovacs, Peter; Tönjes, Anke

2013-12-01

The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD.
Genetic analyses of bone morphogenetic protein 2, 4 and 7 in congenital combined pituitary hormone deficiency

PubMed Central

2013-01-01

Background The complex process of development of the pituitary gland is regulated by a number of signalling molecules and transcription factors. Mutations in these factors have been identified in rare cases of congenital hypopituitarism but for most subjects with combined pituitary hormone deficiency (CPHD) genetic causes are unknown. Bone morphogenetic proteins (BMPs) affect induction and growth of the pituitary primordium and thus represent plausible candidates for mutational screening of patients with CPHD. Methods We sequenced BMP2, 4 and 7 in 19 subjects with CPHD. For validation purposes, novel genetic variants were genotyped in 1046 healthy subjects. Additionally, potential functional relevance for most promising variants has been assessed by phylogenetic analyses and prediction of effects on protein structure. Results Sequencing revealed two novel variants and confirmed 30 previously known polymorphisms and mutations in BMP2, 4 and 7. Although phylogenetic analyses indicated that these variants map within strongly conserved gene regions, there was no direct support for their impact on protein structure when applying predictive bioinformatics tools. Conclusions A mutation in the BMP4 coding region resulting in an amino acid exchange (p.Arg300Pro) appeared most interesting among the identified variants. Further functional analyses are required to ultimately map the relevance of these novel variants in CPHD. PMID:24289245
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Evolutionary genomics and HIV restriction factors.

PubMed

Pyndiah, Nitisha; Telenti, Amalio; Rausell, Antonio

2015-03-01

To provide updated insights into innate antiviral immunity and highlight prototypical evolutionary features of well characterized HIV restriction factors. Recently, a new HIV restriction factor, Myxovirus resistance 2, has been discovered and the region/residue responsible for its activity identified using an evolutionary approach. Furthermore, IFI16, an innate immunity protein known to sense several viruses, has been shown to contribute to the defense to HIV-1 by causing cell death upon sensing HIV-1 DNA. Restriction factors against HIV show characteristic signatures of positive selection. Different patterns of accelerated sequence evolution can distinguish antiviral strategies--offense or defence--as well as the level of specificity of the antiviral properties. Sequence analysis of primate orthologs of restriction factors serves to localize functional domains and sites responsible for antiviral action. We use recent discoveries to illustrate how evolutionary genomic analyses help identify new antiviral genes and their mechanisms of action.
Comparative Analyses of Single-Nucleotide Polymorphisms in the TNF Promoter Region Provide Further Validation for the Vervet Monkey Model of Obesity

PubMed Central

Gray, Stanton B; Howard, Timothy D; Langefeld, Carl D; Hawkins, Gregory A; Diallo, Abdoulaye F; Wagner, Janice D

2009-01-01

Tumor necrosis factor is a cytokine that plays critical roles in inflammation, the innate immune response, and a variety of other physiologic and pathophysiologic processes. In addition, TNF has recently been shown to mediate an intersection of chronic, low-grade inflammation and concurrent metabolic dysregulation associated with obesity and its comorbidities. As part of an ongoing initiative to further characterize vervet monkeys originating from St Kitts as an animal model of obesity and inflammation, we sequenced and genotyped the human ortholog vervet TNF gene and approximately 1 kb of the flanking 3′ and 5′ regions from 265 monkeys in a closed, pedigreed colony. This process revealed a total of 11 single-nucleotide polymorphisms (SNPs) and a single 4-bp insertion–deletion, with minor allele frequencies of 0.08 to 0.39. Many of these polymorphisms were in strong or complete linkage disequilibrium with each other, and all but 1 were contained within a single haplotype block, comprising 5 haplotypes with frequencies of 0.075 to 0.298. Using sequences from humans, chimpanzees, vervets, baboons, and rhesus macaques, phylogenetic shadowing of the TNF promoter region revealed that vervet SNPs, like the SNPs in related species, were clustered nonrandomly and nonuniformly around conserved transcription factor binding sites. These data, combined with previously defined heritable phenotypes, permit future association analyses in this nonhuman primate model and have great potential to help dissect the genetic and nongenetic contributions to complex diseases like obesity. More broadly, the sequence data and comparative analyses reported herein facilitates study of the evolution of regulatory sequences of inflammatory and immune-related genes. PMID:20034434
Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

NASA Astrophysics Data System (ADS)

Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

2018-04-01

Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.
MannDB: A microbial annotation database for protein characterization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, C; Lam, M; Smith, J

2006-05-19

MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-sourcemore » tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports.« less
Identification and characterization of cell-specific enhancer elements for the mouse ETF/Tead2 gene.

PubMed

Tanoue, Y; Yasunami, M; Suzuki, K; Ohkubo, H

2001-12-21

We have identified and characterized by transient transfection assays the cell-specific 117-bp enhancer sequence in the first intron of the mouse ETF (Embryonic TEA domain-containing factor)/Tead2 gene required for transcriptional activation in ETF/Tead2 gene-expressing cells, such as P19 cells. The 117-bp enhancer contains one GC-rich sequence (5'-GGGGCGGGG-3'), termed the GC box, and two tandemly repeated GA-rich sequences (5'-GGGGGAGGGG-3'), termed the proximal and distal GA elements. Further analyses, including transfection studies and electrophoretic mobility shift assays using a series of deletion and mutation constructs, indicated that Sp1, a putative activator, may be required to predominate over its competition with another unknown putative repressor, termed the GA element-binding factor, for binding to both the GC box, which overlapped with the proximal GA element, and the distal GA element in the 117-bp sequence in order to achieve a full enhancer activity. We also discuss a possible mechanism underlying the cell-specific enhancer activity of the 117-bp sequence.
The complete DNA sequence of lymphocystis disease virus.

PubMed

Tidona, C A; Darai, G

1997-04-14

Lymphocystis disease virus (LCDV) is the causative agent of lymphocystis disease, which has been reported to occur in over 100 different fish species worldwide. LCDV is a member of the family Iridoviridae and the type species of the genus Lymphocystivirus. The virions contain a single linear double-stranded DNA molecule, which is circularly permuted, terminally redundant, and heavily methylated at cytosines in CpG sequences. The complete nucleotide sequence of LCDV-1 (flounder isolate) was determined by automated cycle sequencing and primer walking. The genome of LCDV-1 is 102.653 bp in length and contains 195 open reading frames with coding capacities ranging from 40 to 1199 amino acids. Computer-assisted analyses of the deduced amino acid sequences led to the identification of several putative gene products with significant homologies to entries in protein data banks, such as the two major subunits of the viral DNA-dependent RNA polymerase, DNA polymerase, several protein kinases, two subunits of the ribonucleoside diphosphate reductase, DNA methyltransferase, the viral major capsid protein, insulin-like growth factor, and tumor necrosis factor receptor homolog.
A reverse genetics approach identifies novel mutants in light responses and anthocyanin metabolism in petunia.

PubMed

Berenschot, Amanda S; Quecini, Vera

2014-01-01

Flower color and plant architecture are important commercially valuable features for ornamental petunias (Petunia x hybrida Vilm.). Photoperception and light signaling are the major environmental factors controlling anthocyanin and chlorophyll biosynthesis and shade-avoidance responses in higher plants. The genetic regulators of these processes were investigated in petunia by in silico analyses and the sequence information was used to devise a reverse genetics approach to probe mutant populations. Petunia orthologs of photoreceptor, light-signaling components and anthocyanin metabolism genes were identified and investigated for functional conservation by phylogenetic and protein motif analyses. The expression profiles of photoreceptor gene families and of transcription factors regulating anthocyanin biosynthesis were obtained by bioinformatic tools. Two mutant populations, generated by an alkalyting agent and by gamma irradiation, were screened using a phenotype-independent, sequence-based method by high-throughput PCR-based assay. The strategy allowed the identification of novel mutant alleles for anthocyanin biosynthesis (CHALCONE SYNTHASE) and regulation (PH4), and for light signaling (CONSTANS) genes.
Conifer R2R3-MYB transcription factors: sequence analyses and gene expression in wood-forming tissues of white spruce (Picea glauca)

PubMed Central

Bedon, Frank; Grima-Pettenati, Jacqueline; Mackay, John

2007-01-01

Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco. PMID:17397551
Next-Generation Sequencing of Aquatic Oligochaetes: Comparison of Experimental Communities

PubMed Central

Vivien, Régis; Lejzerowicz, Franck; Pawlowski, Jan

2016-01-01

Aquatic oligochaetes are a common group of freshwater benthic invertebrates known to be very sensitive to environmental changes and currently used as bioindicators in some countries. However, more extensive application of oligochaetes for assessing the ecological quality of sediments in watercourses and lakes would require overcoming the difficulties related to morphology-based identification of oligochaetes species. This study tested the Next-Generation Sequencing (NGS) of a standard cytochrome c oxydase I (COI) barcode as a tool for the rapid assessment of oligochaete diversity in environmental samples, based on mixed specimen samples. To know the composition of each sample we Sanger sequenced every specimen present in these samples. Our study showed that a large majority of OTUs (Operational Taxonomic Unit) could be detected by NGS analyses. We also observed congruence between the NGS and specimen abundance data for several but not all OTUs. Because the differences in sequence abundance data were consistent across samples, we exploited these variations to empirically design correction factors. We showed that such factors increased the congruence between the values of oligochaetes-based indices inferred from the NGS and the Sanger-sequenced specimen data. The validation of these correction factors by further experimental studies will be needed for the adaptation and use of NGS technology in biomonitoring studies based on oligochaete communities. PMID:26866802
Natural selection of the major histocompatibility complex (Mhc) in Hawaiian honeycreepers (Drepanidinae)

USGS Publications Warehouse

Jarvi, S.I.; Tarr, C.L.; Mcintosh, C.E.; Atkinson, C.T.; Fleischer, R.C.

2004-01-01

The native Hawaiian honeycreepers represent a classic example of adaptive radiation and speciation, but currently face one the highest extinction rates in the world. Although multiple factors have likely influenced the fate of Hawaiian birds, the relatively recent introduction of avian malaria is thought to be a major factor limiting honeycreeper distribution and abundance. We have initiated genetic analyses of class II ?? chain Mhc genes in four species of honeycreepers using methods that eliminate the possibility of sequencing mosaic variants formed by cloning heteroduplexed polymerase chain reaction products. Phylogenetic analyses group the honeycreeper Mhc sequences into two distinct clusters. Variation within one cluster is high, with dN > d S and levels of diversity similar to other studies of Mhc (B system) genes in birds. The second cluster is nearly invariant and includes sequences from honeycreepers (Fringillidae), a sparrow (Emberizidae) and a blackbird (Emberizidae). This highly conserved cluster appears reminiscent of the independently segregating Rfp-Y system of genes defined in chickens. The notion that balancing selection operates at the Mhc in the honeycreepers is supported by transpecies polymorphism and strikingly high dN/dS ratios at codons putatively involved in peptide interaction. Mitochondrial DNA control region sequences were invariant in the i'iwi, but were highly variable in the 'amakihi. By contrast, levels of variability of class II ?? chain Mhc sequence codons that are hypothesized to be directly involved in peptide interactions appear comparable between i'iwi and 'amakihi. In the i'iwi, natural selection may have maintained variation within the Mhc, even in the face of what appears to a genetic bottleneck.
Transcription factor IID in the Archaea: sequences in the Thermococcus celer genome would encode a product closely related to the TATA-binding protein of eukaryotes

NASA Technical Reports Server (NTRS)

Marsh, T. L.; Reich, C. I.; Whitelock, R. B.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1994-01-01

The first step in transcription initiation in eukaryotes is mediated by the TATA-binding protein, a subunit of the transcription factor IID complex. We have cloned and sequenced the gene for a presumptive homolog of this eukaryotic protein from Thermococcus celer, a member of the Archaea (formerly archaebacteria). The protein encoded by the archaeal gene is a tandem repeat of a conserved domain, corresponding to the repeated domain in its eukaryotic counterparts. Molecular phylogenetic analyses of the two halves of the repeat are consistent with the duplication occurring before the divergence of the archael and eukaryotic domains. In conjunction with previous observations of similarity in RNA polymerase subunit composition and sequences and the finding of a transcription factor IIB-like sequence in Pyrococcus woesei (a relative of T. celer) it appears that major features of the eukaryotic transcription apparatus were well-established before the origin of eukaryotic cellular organization. The divergence between the two halves of the archael protein is less than that between the halves of the individual eukaryotic sequences, indicating that the average rate of sequence change in the archael protein has been less than in its eukaryotic counterparts. To the extent that this lower rate applies to the genome as a whole, a clearer picture of the early genes (and gene families) that gave rise to present-day genomes is more apt to emerge from the study of sequences from the Archaea than from the corresponding sequences from eukaryotes.
Development of Overarm Throwing Technique Reflects Throwing Ability during Childhood

PubMed Central

KASUYAMA, Tatsuya; MUTOU, Ikuo; SASAMOTO, Hitoshi

2016-01-01

Background: It is important to acquire fundamental movement skills during childhood. Throwing is a representative manipulative skill required for various intrinsic factors. However, the relationship between intrinsic factors and throwing ability in childhood is unclear. The purpose of this study was to investigate intrinsic factors related to the ball throwing distance of Japanese elementary school children. Methods: Japanese elementary school children from grades 1-6 (aged 6-12 years; n=112) participated in this study. The main outcome was throwing ability, which was measured as the ball throwing distance. We measured five general anthropometric parameters, seven physical fitness parameters, and the Roberton's developmental sequence for all subjects. The relationships between the throwing ability and the 13 parameters were analysed. Results: The Roberton's developmental sequence was the best predictor of ball throwing distance (r=0.80, p≤0.01). The best multiple regression model, which included sex, handgrip strength, shuttle run test, and the Roberton's developmental sequence, accounted for 81% of the total variance. Conclusions: The development of correct throwing technique reflects throwing abilities in childhood. In addition to the throwing sequence, enhancement of grip strength and aerobic capacity are also required for children's throwing ability. PMID:28289578
'Candidatus Phytoplasma phoenicium' associated with almond witches'-broom disease: from draft genome to genetic diversity among strain populations.

PubMed

Quaglino, Fabio; Kube, Michael; Jawhari, Maan; Abou-Jawdah, Yusuf; Siewert, Christin; Choueiri, Elia; Sobh, Hana; Casati, Paola; Tedeschi, Rosemarie; Lova, Marina Molino; Alma, Alberto; Bianco, Piero Attilio

2015-07-30

Almond witches'-broom (AlmWB), a devastating disease of almond, peach and nectarine in Lebanon, is associated with 'Candidatus Phytoplasma phoenicium'. In the present study, we generated a draft genome sequence of 'Ca. P. phoenicium' strain SA213, representative of phytoplasma strain populations from different host plants, and determined the genetic diversity among phytoplasma strain populations by phylogenetic analyses of 16S rRNA, groEL, tufB and inmp gene sequences. Sequence-based typing and phylogenetic analysis of the gene inmp, coding an integral membrane protein, distinguished AlmWB-associated phytoplasma strains originating from diverse host plants, whereas their 16S rRNA, tufB and groEL genes shared 100 % sequence identity. Moreover, dN/dS analysis indicated positive selection acting on inmp gene. Additionally, the analysis of 'Ca. P. phoenicium' draft genome revealed the presence of integral membrane proteins and effector-like proteins and potential candidates for interaction with hosts. One of the integral membrane proteins was predicted as BI-1, an inhibitor of apoptosis-promoting Bax factor. Bioinformatics analyses revealed the presence of putative BI-1 in draft and complete genomes of other 'Ca. Phytoplasma' species. The genetic diversity within 'Ca. P. phoenicium' strain populations in Lebanon suggested that AlmWB disease could be associated with phytoplasma strains derived from the adaptation of an original strain to diverse hosts. Moreover, the identification of a putative inhibitor of apoptosis-promoting Bax factor (BI-1) in 'Ca. P. phoenicium' draft genome and within genomes of other 'Ca. Phytoplasma' species suggested its potential role as a phytoplasma fitness-increasing factor by modification of the host-defense response.
Sequence and Expression Analyses of Ethylene Response Factors Highly Expressed in Latex Cells from Hevea brasiliensis

PubMed Central

Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

2014-01-01

The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors. PMID:24971876

Sequence and expression analyses of ethylene response factors highly expressed in latex cells from Hevea brasiliensis.

PubMed

Piyatrakul, Piyanuch; Yang, Meng; Putranto, Riza-Arief; Pirrello, Julien; Dessailly, Florence; Hu, Songnian; Summo, Marilyne; Theeravatanasuk, Kannikar; Leclercq, Julie; Kuswanhadi; Montoro, Pascal

2014-01-01

The AP2/ERF superfamily encodes transcription factors that play a key role in plant development and responses to abiotic and biotic stress. In Hevea brasiliensis, ERF genes have been identified by RNA sequencing. This study set out to validate the number of HbERF genes, and identify ERF genes involved in the regulation of latex cell metabolism. A comprehensive Hevea transcriptome was improved using additional RNA reads from reproductive tissues. Newly assembled contigs were annotated in the Gene Ontology database and were assigned to 3 main categories. The AP2/ERF superfamily is the third most represented compared with other transcription factor families. A comparison with genomic scaffolds led to an estimation of 114 AP2/ERF genes and 1 soloist in Hevea brasiliensis. Based on a phylogenetic analysis, functions were predicted for 26 HbERF genes. A relative transcript abundance analysis was performed by real-time RT-PCR in various tissues. Transcripts of ERFs from group I and VIII were very abundant in all tissues while those of group VII were highly accumulated in latex cells. Seven of the thirty-five ERF expression marker genes were highly expressed in latex. Subcellular localization and transactivation analyses suggested that HbERF-VII candidate genes encoded functional transcription factors.
Using Behavior Sequence Analysis to Map Serial Killers' Life Histories.

PubMed

Keatley, David A; Golightly, Hayley; Shephard, Rebecca; Yaksic, Enzo; Reid, Sasha

2018-03-01

The aim of the current research was to provide a novel method for mapping the developmental sequences of serial killers' life histories. An in-depth biographical account of serial killers' lives, from birth through to conviction, was gained and analyzed using Behavior Sequence Analysis. The analyses highlight similarities in behavioral events across the serial killers' lives, indicating not only which risk factors occur, but the temporal order of these factors. Results focused on early childhood environment, indicating the role of parental abuse; behaviors and events surrounding criminal histories of serial killers, showing that many had previous convictions and were known to police for other crimes; behaviors surrounding their murders, highlighting differences in victim choice and modus operandi; and, finally, trial pleas and convictions. The present research, therefore, provides a novel approach to synthesizing large volumes of data on criminals and presenting results in accessible, understandable outcomes.
The genome sequence of the emerging common midwife toad virus identifies an evolutionary intermediate within ranaviruses.

PubMed

Mavian, Carla; López-Bueno, Alberto; Balseiro, Ana; Casais, Rosa; Alcamí, Antonio; Alejo, Alí

2012-04-01

Worldwide amphibian population declines have been ascribed to global warming, increasing pollution levels, and other factors directly related to human activities. These factors may additionally be favoring the emergence of novel pathogens. In this report, we have determined the complete genome sequence of the emerging common midwife toad ranavirus (CMTV), which has caused fatal disease in several amphibian species across Europe. Phylogenetic and gene content analyses of the first complete genomic sequence from a ranavirus isolated in Europe show that CMTV is an amphibian-like ranavirus (ALRV). However, the CMTV genome structure is novel and represents an intermediate evolutionary stage between the two previously described ALRV groups. We find that CMTV clusters with several other ranaviruses isolated from different hosts and locations which might also be included in this novel ranavirus group. This work sheds light on the phylogenetic relationships within this complex group of emerging, disease-causing viruses.
Structural organization and chromosomal assignment of the mouse embryonic TEA domain-containing factor (ETF) gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suzuki, Kazuo; Yasunami, Michio; Matsuda, Yoichi

1996-09-01

Embryonic TEA domain-containing factor (ETF) belongs to the family of proteins structurally related to transcriptional enhancer factor-1 (TEF-1) and is implicated in neural development. Isolation and characterization of the cosmid clones encoding the mouse ETF gene (Etdf) revealed that Etdf spans approximately 17.9 kb and consists of 12 exons. The exon-intron structure of Etdf closely resembles that of the Drosophila scalloped gene, indicating that these genes may have evolved from a common ancestor. Then multiple transcription initiation sites revealed by S1 protection and primer extension analyses are consistent with the absence of the canonical TATA and CAAT boxes in themore » 5{prime}-flanking region, which contains many potential regulatory sequences, such as the E-box, N-box, Sp1 element, GATA-1 element, TAATGARAT element, and B2 short interspersed element (SINE) as well as several direct and inverted repeat sequences. The Etdf locus was assigned to the proximal region of mouse chromosome 7 using fluorescence in situ hybridization and linkage mapping analyses. These results provide the molecular basis for studying the regulation, in vivo function, and evolution of Etdf. 29 refs., 5 figs., 1 tab.« less
Structural organization and chromosomal assignment of the mouse embryonic TEA domain-containing factor (ETF) gene.

PubMed

Suzuki, K; Yasunami, M; Matsuda, Y; Maeda, T; Kobayashi, H; Terasaki, H; Ohkubo, H

1996-09-01

Embryonic TEA domain-containing factor (ETF) belongs to the family of proteins structurally related to transcriptional enhancer factor-1 (TEF-1) and is implicated in neural development. Isolation and characterization of the cosmid clones encoding the mouse ETF gene (Etdf) revealed that Etdf spans approximately 17.9 kb and consists of 12 exons. The exon-intron structure of Etdf closely resembles that of the Drosophila scalloped gene, indicating that these genes may have evolved from a common ancestor. The multiple transcription initiation sites revealed by S1 protection and primer extension analyses are consistent with the absence of the canonical TATA and CAAT boxes in the 5'-flanking region, which contains many potential regulatory sequences, such as the E-box, N-box, Sp1 element, GATA-1 element, TAATGARAT element, and B2 short interspersed element (SINE) as well as several direct and inverted repeat sequences. The Etdf locus was assigned to the proximal region of mouse chromosome 7 using fluorescence in situ hybridization and linkage mapping analyses. These results provide the molecular basis for studying the regulation, in vivo function, and evolution of Etdf.
Comparative genomic analysis and characterization of incompatibility group FIB plasmid encoded virulence factors of Salmonella enterica isolated from food sources.

PubMed

Khajanchi, Bijay K; Hasan, Nur A; Choi, Seon Young; Han, Jing; Zhao, Shaohua; Colwell, Rita R; Cerniglia, Carl E; Foley, Steven L

2017-08-02

The degree to which the chromosomal mediated iron acquisition system contributes to virulence of many bacterial pathogens is well defined. However, the functional roles of plasmid encoded iron acquisition systems, specifically Sit and aerobactin, have yet to be determined for Salmonella spp. In a recent study, Salmonella enterica strains isolated from different food sources were sequenced on the Illumina MiSeq platform and found to harbor the incompatibility group (Inc) FIB plasmid. In this study, we examined sequence diversity and the contribution of factors encoded on the IncFIB plasmid to the virulence of S. enterica. Whole genome sequences of seven S. enterica isolates were compared to genomes of serovars of S. enterica isolated from food, animal, and human sources. SeqSero analysis predicted that six strains were serovar Typhimurium and one was Heidelberg. Among the S. Typhimurium strains, single nucleotide polymorphism (SNP)-based phylogenetic analyses revealed that five of the isolates clustered as a single monophyletic S. Typhimurium subclade, while one of the other strains branched with S. Typhimurium from a bovine source. DNA sequence based phylogenetic diversity analyses showed that the IncFIB plasmid-encoded Sit and aerobactin iron acquisition systems are conserved among bacterial species including S. enterica. The IncFIB plasmid was transferred to an IncFIB plasmid deficient strain of S. enterica by conjugation. The transconjugant SE819::IncFIB persisted in human intestinal epithelial (Caco-2) cells at a higher rate than the recipient SE819. Genes of the Sit and aerobactin operons in the IncFIB plasmid were differentially expressed in iron-rich and iron-depleted growth media. Minimal sequence diversity was detected in the Sit and aerobactin operons in the IncFIB plasmids present among different bacterial species, including foodborne Salmonella strains. IncFIB plasmid encoded factors play a role during infection under low-iron conditions in host cells.
Comparative transcriptome analysis of Aspergillus flavus isolates under different oxidative stresses and culture media

USDA-ARS?s Scientific Manuscript database

Aspergillus flavus and aflatoxin contamination in the field are known to be influenced by numerous stress factors, particularly drought and heat stress. However, the purpose of aflatoxin production is unknown. Here, we report transcriptome analyses comprised of 282.6 Gb of sequencing data describing...
Deciphering the molecular mechanisms underlying the binding of the TWIST1/E12 complex to regulatory E-box sequences

PubMed Central

Bouard, Charlotte; Terreux, Raphael; Honorat, Mylène; Manship, Brigitte; Ansieau, Stéphane; Vigneron, Arnaud M.; Puisieux, Alain; Payen, Léa

2016-01-01

Abstract The TWIST1 bHLH transcription factor controls embryonic development and cancer processes. Although molecular and genetic analyses have provided a wealth of data on the role of bHLH transcription factors, very little is known on the molecular mechanisms underlying their binding affinity to the E-box sequence of the promoter. Here, we used an in silico model of the TWIST1/E12 (TE) heterocomplex and performed molecular dynamics (MD) simulations of its binding to specific (TE-box) and modified E-box sequences. We focused on (i) active E-box and inactive E-box sequences, on (ii) modified active E-box sequences, as well as on (iii) two box sequences with modified adjacent bases the AT- and TA-boxes. Our in silico models were supported by functional in vitro binding assays. This exploration highlighted the predominant role of protein side-chain residues, close to the heart of the complex, at anchoring the dimer to DNA sequences, and unveiled a shift towards adjacent ((-1) and (-1*)) bases and conserved bases of modified E-box sequences. In conclusion, our study provides proof of the predictive value of these MD simulations, which may contribute to the characterization of specific inhibitors by docking approaches, and their use in pharmacological therapies by blocking the tumoral TWIST1/E12 function in cancers. PMID:27151200
Bioinformatic Analysis of Strawberry GSTF12 Gene

NASA Astrophysics Data System (ADS)

Wang, Xiran; Jiang, Leiyu; Tang, Haoru

2018-01-01

GSTF12 has always been known as a key factor of proanthocyanins accumulate in plant testa. Through bioinformatics analysis of the nucleotide and encoded protein sequence of GSTF12, it is more advantageous to the study of genes related to anthocyanin biosynthesis accumulation pathway. Therefore, we chosen GSTF12 gene of 11 kinds species, downloaded their nucleotide and protein sequence from NCBI as the research object, found strawberry GSTF12 gene via bioinformation analyse, constructed phylogenetic tree. At the same time, we analysed the strawberry GSTF12 gene of physical and chemical properties and its protein structure and so on. The phylogenetic tree showed that Strawberry and petunia were closest relative. By the protein prediction, we found that the protein owed one proper signal peptide without obvious transmembrane regions.
Contrasting effects of geographical separation on the genetic population structure of sympatric species of mites in avocado orchards.

PubMed

Guzman-Valencia, S; Santillán-Galicia, M T; Guzmán-Franco, A W; González-Hernández, H; Carrillo-Benítez, M G; Suárez-Espinoza, J

2014-10-01

Oligonychus punicae and Oligonychus perseae (Acari: Tetranychidae) are the most important mite species affecting avocado orchards in Mexico. Here we used nucleotide sequence data from segments of the nuclear ribosomal internal transcribed spacers (ITS1 and ITS2) and mitochondrial cytochrome oxidase subunit I (COI) genes to assess the phylogenetic relationships between both sympatric mite species and, using only ITS sequence data, examine genetic variation and population structure in both species, to test the hypothesis that, although both species co-occur, their genetic population structures are different in both Michoacan state (main producer) and Mexico state. Phylogenetic analysis showed a clear separation between both species using ITS and COI sequence information. Haplotype network analysis done on 24 samples of O. punicae revealed low genetic diversity with only three haplotypes found but a significant geographical population structure confirmed by analysis of molecular variance (AMOVA) and Kimura-2-parameter (K2P) analyses. In addition, a Mantel test revealed that geographical isolation was a factor responsible for the genetic differentiation. In contrast, analyses of 22 samples of O. perseae revealed high genetic diversity with 15 haplotypes found but no geographical structure confirmed by the AMOVA, K2P and Mantel test analyses. We have suggested that geographical separation is one of the most important factors driving genetic variation, but that it affected each species differently. The role of the ecology of these species on our results, and the importance of our findings in the development of monitoring and control strategies are discussed.
Position specific variation in the rate of evolution in transcription factor binding sites

PubMed Central

Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B

2003-01-01

Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
WRKY transcription factor genes in wild rice Oryza nivara

PubMed Central

Xu, Hengjian; Watanabe, Kenneth A.; Zhang, Liyuan; Shen, Qingxi J.

2016-01-01

The WRKY transcription factor family is one of the largest gene families involved in plant development and stress response. Although many WRKY genes have been studied in cultivated rice (Oryza sativa), the WRKY genes in the wild rice species Oryza nivara, the direct progenitor of O. sativa, have not been studied. O. nivara shows abundant genetic diversity and elite drought and disease resistance features. Herein, a total of 97 O. nivara WRKY (OnWRKY) genes were identified. RNA-sequencing demonstrates that OnWRKY genes were generally expressed at higher levels in the roots of 30-day-old plants. Bioinformatic analyses suggest that most of OnWRKY genes could be induced by salicylic acid, abscisic acid, and drought. Abundant potential MAPK phosphorylation sites in OnWRKYs suggest that activities of most OnWRKYs can be regulated by phosphorylation. Phylogenetic analyses of OnWRKYs support a novel hypothesis that ancient group IIc OnWRKYs were the original ancestors of only some group IIc and group III WRKYs. The analyses also offer strong support that group IIc OnWRKYs containing the HVE sequence in their zinc finger motifs were derived from group Ia WRKYs. This study provides a solid foundation for the study of the evolution and functions of WRKY genes in O. nivara. PMID:27345721
Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

PubMed

Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin

2012-03-01

The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.
Predicting the binding preference of transcription factors to individual DNA k-mers.

PubMed

Alleyne, Trevis M; Peña-Castillo, Lourdes; Badis, Gwenael; Talukder, Shaheynoor; Berger, Michael F; Gehrke, Andrew R; Philippakis, Anthony A; Bulyk, Martha L; Morris, Quaid D; Hughes, Timothy R

2009-04-15

Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members. We employed a new dataset consisting of the relative preferences of mouse homeodomains for all eight-base DNA sequences in order to ask how well we can predict the binding profiles of homeodomains when only their protein sequences are given. We evaluated a panel of standard statistical inference techniques, as well as variations of the protein features considered. Nearest neighbour among functionally important residues emerged among the most effective methods. Our results underscore the complexity of TF-DNA recognition, and suggest a rational approach for future analyses of TF families.
A comparative genomics perspective on the genetic content of the alkaliphilic haloarchaeon Natrialba magadii ATCC 43099T

PubMed Central

2012-01-01

Background Natrialba magadii is an aerobic chemoorganotrophic member of the Euryarchaeota and is a dual extremophile requiring alkaline conditions and hypersalinity for optimal growth. The genome sequence of Nab. magadii type strain ATCC 43099 was deciphered to obtain a comprehensive insight into the genetic content of this haloarchaeon and to understand the basis of some of the cellular functions necessary for its survival. Results The genome of Nab. magadii consists of four replicons with a total sequence of 4,443,643 bp and encodes 4,212 putative proteins, some of which contain peptide repeats of various lengths. Comparative genome analyses facilitated the identification of genes encoding putative proteins involved in adaptation to hypersalinity, stress response, glycosylation, and polysaccharide biosynthesis. A proton-driven ATP synthase and a variety of putative cytochromes and other proteins supporting aerobic respiration and electron transfer were encoded by one or more of Nab. magadii replicons. The genome encodes a number of putative proteases/peptidases as well as protein secretion functions. Genes encoding putative transcriptional regulators, basal transcription factors, signal perception/transduction proteins, and chemotaxis/phototaxis proteins were abundant in the genome. Pathways for the biosynthesis of thiamine, riboflavin, heme, cobalamin, coenzyme F420 and other essential co-factors were deduced by in depth sequence analyses. However, approximately 36% of Nab. magadii protein coding genes could not be assigned a function based on Blast analysis and have been annotated as encoding hypothetical or conserved hypothetical proteins. Furthermore, despite extensive comparative genomic analyses, genes necessary for survival in alkaline conditions could not be identified in Nab. magadii. Conclusions Based on genomic analyses, Nab. magadii is predicted to be metabolically versatile and it could use different carbon and energy sources to sustain growth. Nab. magadii has the genetic potential to adapt to its milieu by intracellular accumulation of inorganic cations and/or neutral organic compounds. The identification of Nab. magadii genes involved in coenzyme biosynthesis is a necessary step toward further reconstruction of the metabolic pathways in halophilic archaea and other extremophiles. The knowledge gained from the genome sequence of this haloalkaliphilic archaeon is highly valuable in advancing the applications of extremophiles and their enzymes. PMID:22559199
Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

PubMed Central

Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

2014-01-01

Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182
Multilevel Factor Analyses of Family Data from the Hawai'i Family Study of Cognition

ERIC Educational Resources Information Center

McArdle, John J.; Hamagami, Fumiaki; Bautista, Randy; Onoye, Jane; Hishinuma, Earl S.; Prescott, Carol A.; Takeshita, Junji; Zonderman, Alan B.; Johnson, Ronald C.

2014-01-01

In this study, we reanalyzed the classic Hawai'i Family Study of Cognition (HFSC) data using contemporary multilevel modeling techniques. We used the HFSC baseline data ("N" = 6,579) and reexamined the factorial structure of 16 cognitive variables using confirmatory (restricted) measurement models in an explicit sequence. These models…
Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

DOE PAGES

Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric; ...

2015-03-30

Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less
Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schulze, Kornelius; Imbeaud, Sandrine; Letouzé, Eric

Our genomic analyses promise to improve tumor characterization to optimize personalized treatment for patients with hepatocellular carcinoma (HCC). Exome sequencing analysis of 243 liver tumors identified mutational signatures associated with specific risk factors, mainly combined alcohol and tobacco consumption and exposure to aflatoxin B1. We identified 161 putative driver genes associated with 11 recurrently altered pathways. Associations of mutations defined 3 groups of genes related to risk factors and centered on CTNNB1 (alcohol), TP53 (hepatitis B virus, HBV) and AXIN1. These analyses according to tumor stage progression identified TERT promoter mutation as an early event, whereasFGF3, FGF4, FGF19 or CCND1more » amplification and TP53 and CDKN2A alterations appeared at more advanced stages in aggressive tumors. In 28% of the tumors, we identified genetic alterations potentially targetable by US Food and Drug Administration (FDA)–approved drugs. Finally, we identified risk factor–specific mutational signatures and defined the extensive landscape of altered genes and pathways in HCC, which will be useful to design clinical trials for targeted therapy.« less
An interactive environment for agile analysis and visualization of ChIP-sequencing data.

PubMed

Lerdrup, Mads; Johansen, Jens Vilstrup; Agrawal-Singh, Shuchi; Hansen, Klaus

2016-04-01

To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.

TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

PubMed

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites

PubMed Central

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
Insights into the phylogeny of Northern Hemisphere Armillaria: Neighbor-net and Bayesian analyses of translation elongation factor 1-α gene sequences.

PubMed

Klopfenstein, Ned B; Stewart, Jane E; Ota, Yuko; Hanna, John W; Richardson, Bryce A; Ross-Davis, Amy L; Elías-Román, Rubén D; Korhonen, Kari; Keča, Nenad; Iturritxa, Eugenia; Alvarado-Rosales, Dionicio; Solheim, Halvor; Brazee, Nicholas J; Łakomy, Piotr; Cleary, Michelle R; Hasegawa, Eri; Kikuchi, Taisei; Garza-Ocañas, Fortunato; Tsopelas, Panaghiotis; Rigling, Daniel; Prospero, Simone; Tsykun, Tetyana; Bérubé, Jean A; Stefani, Franck O P; Jafarpour, Saeideh; Antonín, Vladimír; Tomšovský, Michal; McDonald, Geral I; Woodward, Stephen; Kim, Mee-Sook

2017-01-01

Armillaria possesses several intriguing characteristics that have inspired wide interest in understanding phylogenetic relationships within and among species of this genus. Nuclear ribosomal DNA sequence-based analyses of Armillaria provide only limited information for phylogenetic studies among widely divergent taxa. More recent studies have shown that translation elongation factor 1-α (tef1) sequences are highly informative for phylogenetic analysis of Armillaria species within diverse global regions. This study used Neighbor-net and coalescence-based Bayesian analyses to examine phylogenetic relationships of newly determined and existing tef1 sequences derived from diverse Armillaria species from across the Northern Hemisphere, with Southern Hemisphere Armillaria species included for reference. Based on the Bayesian analysis of tef1 sequences, Armillaria species from the Northern Hemisphere are generally contained within the following four superclades, which are named according to the specific epithet of the most frequently cited species within the superclade: (i) Socialis/Tabescens (exannulate) superclade including Eurasian A. ectypa, North American A. socialis (A. tabescens), and Eurasian A. socialis (A. tabescens) clades; (ii) Mellea superclade including undescribed annulate North American Armillaria sp. (Mexico) and four separate clades of A. mellea (Europe and Iran, eastern Asia, and two groups from North America); (iii) Gallica superclade including Armillaria Nag E (Japan), multiple clades of A. gallica (Asia and Europe), A. calvescens (eastern North America), A. cepistipes (North America), A. altimontana (western USA), A. nabsnona (North America and Japan), and at least two A. gallica clades (North America); and (iv) Solidipes/Ostoyae superclade including two A. solidipes/ostoyae clades (North America), A. gemina (eastern USA), A. solidipes/ostoyae (Eurasia), A. cepistipes (Europe and Japan), A. sinapina (North America and Japan), and A. borealis (Eurasia) clade 2. Of note is that A. borealis (Eurasia) clade 1 appears basal to the Solidipes/Ostoyae and Gallica superclades. The Neighbor-net analysis showed similar phylogenetic relationships. This study further demonstrates the utility of tef1 for global phylogenetic studies of Armillaria species and provides critical insights into multiple taxonomic issues that warrant further study.
Clonal Relatedness of Enterotoxigenic Escherichia coli (ETEC) Strains Expressing LT and CS17 Isolated from Children with Diarrhoea in La Paz, Bolivia

PubMed Central

Rodas, Claudia; Klena, John D.; Nicklasson, Matilda; Iniguez, Volga; Sjöling, Åsa

2011-01-01

Background Enterotoxigenic Escherichia coli (ETEC) is a major cause of traveller's and infantile diarrhoea in the developing world. ETEC produces two toxins, a heat-stable toxin (known as ST) and a heat-labile toxin (LT) and colonization factors that help the bacteria to attach to epithelial cells. Methodology/Principal Findings In this study, we characterized a subset of ETEC clinical isolates recovered from Bolivian children under 5 years of age using a combination of multilocus sequence typing (MLST) analysis, virulence typing, serotyping and antimicrobial resistance test patterns in order to determine the genetic background of ETEC strains circulating in Bolivia. We found that strains expressing the heat-labile (LT) enterotoxin and colonization factor CS17 were common and belonged to several MLST sequence types but mainly to sequence type-423 and sequence type-443 (Achtman scheme). To further study the LT/CS17 strains we analysed the nucleotide sequence of the CS17 operon and compared the structure to LT/CS17 ETEC isolates from Bangladesh. Sequence analysis confirmed that all sequence type-423 strains from Bolivia had a single nucleotide polymorphism; SNPbol in the CS17 operon that was also found in some other MLST sequence types from Bolivia but not in strains recovered from Bangladeshi children. The dominant ETEC clone in Bolivia (sequence type-423/SNPbol) was found to persist over multiple years and was associated with severe diarrhoea but these strains were variable with respect to antimicrobial resistance patterns. Conclusion/Significance The results showed that although the LT/CS17 phenotype is common among ETEC strains in Bolivia, multiple clones, as determined by unique MLST sequence types, populate this phenotype. Our data also appear to suggest that acquisition and loss of antimicrobial resistance in LT-expressing CS17 ETEC clones is more dynamic than acquisition or loss of virulence factors. PMID:22140423
Clonal relatedness of enterotoxigenic Escherichia coli (ETEC) strains expressing LT and CS17 isolated from children with diarrhoea in La Paz, Bolivia.

PubMed

Rodas, Claudia; Klena, John D; Nicklasson, Matilda; Iniguez, Volga; Sjöling, Asa

2011-01-01

Enterotoxigenic Escherichia coli (ETEC) is a major cause of traveller's and infantile diarrhoea in the developing world. ETEC produces two toxins, a heat-stable toxin (known as ST) and a heat-labile toxin (LT) and colonization factors that help the bacteria to attach to epithelial cells. In this study, we characterized a subset of ETEC clinical isolates recovered from Bolivian children under 5 years of age using a combination of multilocus sequence typing (MLST) analysis, virulence typing, serotyping and antimicrobial resistance test patterns in order to determine the genetic background of ETEC strains circulating in Bolivia. We found that strains expressing the heat-labile (LT) enterotoxin and colonization factor CS17 were common and belonged to several MLST sequence types but mainly to sequence type-423 and sequence type-443 (Achtman scheme). To further study the LT/CS17 strains we analysed the nucleotide sequence of the CS17 operon and compared the structure to LT/CS17 ETEC isolates from Bangladesh. Sequence analysis confirmed that all sequence type-423 strains from Bolivia had a single nucleotide polymorphism; SNP(bol) in the CS17 operon that was also found in some other MLST sequence types from Bolivia but not in strains recovered from Bangladeshi children. The dominant ETEC clone in Bolivia (sequence type-423/SNP(bol)) was found to persist over multiple years and was associated with severe diarrhoea but these strains were variable with respect to antimicrobial resistance patterns. The results showed that although the LT/CS17 phenotype is common among ETEC strains in Bolivia, multiple clones, as determined by unique MLST sequence types, populate this phenotype. Our data also appear to suggest that acquisition and loss of antimicrobial resistance in LT-expressing CS17 ETEC clones is more dynamic than acquisition or loss of virulence factors.
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity

PubMed Central

Hurst, Gregory D.D.

2017-01-01

High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.

PubMed

Gerth, Michael; Hurst, Gregory D D

2017-01-01

High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Genetic diversity of Haemonchus contortus isolated from sympatric wild blue sheep (Pseudois nayaur) and sheep in Helan Mountains, China.

PubMed

Shen, Dong-Dong; Wang, Ji-Fei; Zhang, Dan-Yu; Peng, Zhi-Wei; Yang, Tian-Yun; Wang, Zhao-Ding; Bowman, Dwight D; Hou, Zhi-Jun; Liu, Zhen-Sheng

2017-09-19

Haemonchus contortus is known among parasitic nematodes as one of the major veterinary pathogens of small ruminants and results in great economic losses worldwide. Human activities, such as the sympatric grazing of wild with domestic animals, may place susceptible wildlife hosts at risk of increased prevalence and infection intensity with this common small ruminant parasite. Studies on phylogenetic factors of H. contortus should assist in defining the amount of the impact of anthropogenic factors on the extent of sharing of agents such as this nematode between domestic animals and wildlife. H. contortus specimens (n = 57) were isolated from wild blue sheep (Pseudois nayaur) inhabiting Helan Mountains (HM), China and additional H. contortus specimens (n = 20) were isolated from domestic sheep that were grazed near the natural habitat of the blue sheep. Complete ITS2 (second internal transcribed spacer) sequences and partial sequences of the nad4 (nicotinamide dehydrogenase subunit 4 gene) gene were amplified to determine the sequence variations and population genetic diversities between these two populations. Also, 142 nad4 haplotype sequences of H. contortus from seven other geographical regions of China were retrieved from database to further examine the H. contortus population structure. Sequence analysis revealed 10 genotypes (ITS2) and 73 haplotypes (nad4) among the 77 specimens, with nucleotide diversities of 0.007 and 0.021, respectively, similar to previous studies in other countries, such as Pakistan, Malaysia and Yemen. Phylogenetic analyses (BI, MP, NJ) of nad4 sequences showed that there were no noticeable boundaries among H. contortus populations from different geographical origin and population genetic analyses revealed that most of the variation (94.21%) occurred within H. contortus populations. All phylogenetic analyses indicated that there was little genetic differentiation but a high degree of gene flow among the H. contortus populations among wild blue sheep and domestic ruminants in China. The current work is the first genetic characterization of H. contortus isolated from wild blue sheep in the Helan Mountains region. The results revealed a low genetic differentiation and high degree of gene flow between the H. contortus populations from sympatric wild blue sheep and domestic sheep, indicating regular cross-infection between the sympatrically reared ruminants.
Molecular diversity of arbuscular mycorrhizal fungi in relation to soil chemical properties and heavy metal contamination.

PubMed

Zarei, Mehdi; Hempel, Stefan; Wubet, Tesfaye; Schäfer, Tina; Savaghebi, Gholamreza; Jouzani, Gholamreza Salehi; Nekouei, Mojtaba Khayam; Buscot, François

2010-08-01

Abundance and diversity of arbuscular mycorrhizal fungi (AMF) associated with dominant plant species were studied along a transect from highly lead (Pb) and zinc (Zn) polluted to non-polluted soil at the Anguran open pit mine in Iran. Using an established primer set for AMF in the internal transcribed spacer (ITS) region of rDNA, nine different AMF sequence types were distinguished after phylogenetic analyses, showing remarkable differences in their distribution patterns along the transect. With decreasing Pb and Zn concentration, the number of AMF sequence types increased, however one sequence type was only found in the highly contaminated area. Multivariate statistical analysis revealed that further factors than HM soil concentration affect the AMF community at contaminated sites. Specifically, the soils' calcium carbonate equivalent and available P proved to be of importance, which illustrates that field studies on AMF distribution should also consider important environmental factors and their possible interactions. Copyright 2010 Elsevier Ltd. All rights reserved.
H-2RIIBP, a member of the nuclear hormone receptor superfamily that binds to both the regulatory element of major histocompatibility class I genes and the estrogen response element.

PubMed

Hamada, K; Gleason, S L; Levi, B Z; Hirschfeld, S; Appella, E; Ozato, K

1989-11-01

Transcription of major histocompatibility complex (MHC) class I genes is regulated by the conserved MHC class I regulatory element (CRE). The CRE has two factor-binding sites, region I and region II, both of which elicit enhancer function. By screening a mouse lambda gt 11 library with the CRE as a probe, we isolated a cDNA clone that encodes a protein capable of binding to region II of the CRE. This protein, H-2RIIBP (H-2 region II binding protein), bound to the native region II sequence, but not to other MHC cis-acting sequences or to mutant region II sequences, similar to the naturally occurring region II factor in mouse cells. The deduced amino acid sequence of H-2RIIBP revealed two putative zinc fingers homologous to the DNA-binding domain of steroid/thyroid hormone receptors. Although sequence similarity in other regions was minimal, H-2RIIBP has apparent modular domains characteristic of the nuclear hormone receptors. Further analyses showed that both H-2RIIBP and the natural region II factor bind to the estrogen response element (ERE) of the vitellogenin A2 gene. The ERE is composed of a palindrome, and half of this palindrome resembles the region II binding site of the MHC CRE. These results indicate that H-2RIIBP (i) is a member of the superfamily of nuclear hormone receptors and (ii) may regulate not only MHC class I genes but also genes containing the ERE and related sequences. Sequences homologous to the H-2RIIBP gene are widely conserved in the animal kingdom. H-2RIIBP mRNA is expressed in many mouse tissues, in agreement with the distribution of the natural region II factor.
Problems and Alternatives in the Combat Rescue of Navy Aircrewmen

DTIC Science & Technology

1980-11-01

33 Objectives for an Improved Combat Rescue System ................................. 33 Decrease Injury Rate...altitudes as low as 50 feet above ground level. Extremity rest aint systems, constantly being improved, are reducing the number of injuries resulting...These analyses established causal factors associated with the injuries and problems occurring during the ejection-through.survival sequence. Later
Multigene assessment of the species boundaries and sexual status of the basidiomycetous yeasts Cryptococcus flavescens and C. terrestris (Tremellales).

PubMed

Yurkov, Andrey; Guerreiro, Marco A; Sharma, Lav; Carvalho, Cláudia; Fonseca, Álvaro

2015-01-01

Cryptococcus flavescens and C. terrestris are phenotypically indistinguishable sister species that belong to the order Tremellales (Tremellomycetes, Basidiomycota) and which may be mistaken for C. laurentii based on phenotype. Phylogenetic separation between C. flavescens and C. terrestris was based on rDNA sequence analyses, but very little is known on their intraspecific genetic variability or propensity for sexual reproduction. We studied 59 strains from different substrates and geographic locations, and used a multilocus sequencing (MLS) approach complemented with the sequencing of mating type (MAT) genes to assess genetic variation and reexamine the boundaries of the two species, as well as their sexual status. The following five loci were chosen for MLS: the rDNA ITS-LSU region, the rDNA IGS1 spacer, and fragments of the genes encoding the largest subunit of RNA polymerase II (RPB1), the translation elongation factor 1 alpha (TEF1) and the p21-activated protein kinase (STE20). Phylogenetic network analyses confirmed the genetic separation of the two species and revealed two additional cryptic species, for which the names Cryptococcus baii and C. ruineniae are proposed. Further analyses of the data revealed a high degree of genetic heterogeneity within C. flavescens as well as evidence for recombination between lineages detected for this species. Strains of C. terrestris displayed higher levels of similarity in all analysed genes and appear to make up a single recombining group. The two MAT genes (STE3 and SXI1/SXI2) sequenced for C. flavescens strains confirmed the potential for sexual reproduction and suggest the presence of a tetrapolar mating system with a biallelic pheromone/receptor locus and a multiallelic HD locus. In C. terrestris we could only sequence STE3, which revealed a biallelic P/R locus. In spite of the strong evidence for sexual recombination in the two species, attempts at mating compatible strains of both species on culture media were unsuccessful.
Multigene Assessment of the Species Boundaries and Sexual Status of the Basidiomycetous Yeasts Cryptococcus flavescens and C. terrestris (Tremellales)

PubMed Central

Sharma, Lav; Carvalho, Cláudia; Fonseca, Álvaro

2015-01-01

Cryptococcus flavescens and C. terrestris are phenotypically indistinguishable sister species that belong to the order Tremellales (Tremellomycetes, Basidiomycota) and which may be mistaken for C. laurentii based on phenotype. Phylogenetic separation between C. flavescens and C. terrestris was based on rDNA sequence analyses, but very little is known on their intraspecific genetic variability or propensity for sexual reproduction. We studied 59 strains from different substrates and geographic locations, and used a multilocus sequencing (MLS) approach complemented with the sequencing of mating type (MAT) genes to assess genetic variation and reexamine the boundaries of the two species, as well as their sexual status. The following five loci were chosen for MLS: the rDNA ITS-LSU region, the rDNA IGS1 spacer, and fragments of the genes encoding the largest subunit of RNA polymerase II (RPB1), the translation elongation factor 1 alpha (TEF1) and the p21-activated protein kinase (STE20). Phylogenetic network analyses confirmed the genetic separation of the two species and revealed two additional cryptic species, for which the names Cryptococcus baii and C. ruineniae are proposed. Further analyses of the data revealed a high degree of genetic heterogeneity within C. flavescens as well as evidence for recombination between lineages detected for this species. Strains of C. terrestris displayed higher levels of similarity in all analysed genes and appear to make up a single recombining group. The two MAT genes (STE3 and SXI1/SXI2) sequenced for C. flavescens strains confirmed the potential for sexual reproduction and suggest the presence of a tetrapolar mating system with a biallelic pheromone/receptor locus and a multiallelic HD locus. In C. terrestris we could only sequence STE3, which revealed a biallelic P/R locus. In spite of the strong evidence for sexual recombination in the two species, attempts at mating compatible strains of both species on culture media were unsuccessful. PMID:25811603
Genome-Wide Profiling of Small RNAs and Degradome Revealed Conserved Regulations of miRNAs on Auxin-Responsive Genes during Fruit Enlargement in Peaches

PubMed Central

Shi, Mengya; Hu, Xiao; Wei, Yu; Hou, Xu; Yuan, Xue; Liu, Jun; Liu, Yueping

2017-01-01

Auxin has long been known as a critical phytohormone that regulates fruit development in plants. However, due to the lack of an enlarged ovary wall in the model plants Arabidopsis and rice, the molecular regulatory mechanisms of fruit division and enlargement remain unclear. In this study, we performed small RNA sequencing and degradome sequencing analyses to systematically explore post-transcriptional regulation in the mesocarp at the hard core stage following treatment of the peach (Prunus persica L.) fruit with the synthetic auxin α-naphthylacetic acid (NAA). Our analyses identified 24 evolutionarily conserved miRNA genes as well as 16 predicted genes. Experimental verification showed that the expression levels of miR398 and miR408b were significantly upregulated after NAA treatment, whereas those of miR156, miR160, miR166, miR167, miR390, miR393, miR482, miR535 and miR2118 were significantly downregulated. Degradome sequencing coupled with miRNA target prediction analyses detected 119 significant cleavage sites on several mRNA targets, including SQUAMOSA promoter binding protein–like (SPL), ARF, (NAM, ATAF1/2 and CUC2) NAC, Arabidopsis thaliana homeobox protein (ATHB), the homeodomain-leucine zipper transcription factor revoluta(REV), (teosinte-like1, cycloidea and proliferating cell factor1) TCP and auxin signaling F-box protein (AFB) family genes. Our systematic profiling of miRNAs and the degradome in peach fruit suggests the existence of a post-transcriptional regulation network of miRNAs that target auxin pathway genes in fruit development. PMID:29236054
IRTs of the ABCs: Children's Letter Name Acquisition

PubMed Central

Piasta, Shayne B.; Anthony, Jason L.; Lonigan, Christopher J.; Francis, David J.

2015-01-01

We examined the developmental sequence of letter name knowledge acquisition by children from 2 to five years of age. Data from 2 samples representing diverse regions, ethnicity, and socioeconomic backgrounds (ns = 1074 & 500) were analyzed using item response theory (IRT) and differential item functioning techniques. Results from factor analyses indicated that letter name knowledge represented a unidimensional skill; IRT results yielded significant differences between letters in both difficulty and discrimination. Results also indicated an approximate developmental sequence in letter name learning for the simplest and most challenging to learn letters -- but with no clear sequence between these extremes. Findings also suggested that children were most likely to first learn their first initial. We discuss implications for assessment and instruction. PMID:22710016
[Methods, challenges and opportunities for big data analyses of microbiome].

PubMed

Sheng, Hua-Fang; Zhou, Hong-Wei

2015-07-01

Microbiome is a novel research field related with a variety of chronic inflamatory diseases. Technically, there are two major approaches to analysis of microbiome: metataxonome by sequencing the 16S rRNA variable tags, and metagenome by shot-gun sequencing of the total microbial (mainly bacterial) genome mixture. The 16S rRNA sequencing analyses pipeline includes sequence quality control, diversity analyses, taxonomy and statistics; metagenome analyses further includes gene annotation and functional analyses. With the development of the sequencing techniques, the cost of sequencing will decrease, and big data analyses will become the central task. Data standardization, accumulation, modeling and disease prediction are crucial for future exploit of these data. Meanwhile, the information property in these data, and the functional verification with culture-dependent and culture-independent experiments remain the focus in future research. Studies of human microbiome will bring a better understanding of the relations between the human body and the microbiome, especially in the context of disease diagnosis and therapy, which promise rich research opportunities.
Porcine Epidemic Diarrhea in Europe: In-Detail Analyses of Disease Dynamics and Molecular Epidemiology.

PubMed

Hanke, Dennis; Pohlmann, Anne; Sauter-Louis, Carola; Höper, Dirk; Stadler, Julia; Ritzmann, Mathias; Steinrigl, Adi; Schwarz, Bernd-Andreas; Akimkin, Valerij; Fux, Robert; Blome, Sandra; Beer, Martin

2017-07-06

Porcine epidemic diarrhea (PED) is an acute and highly contagious enteric disease of swine caused by the eponymous virus (PEDV) which belongs to the genus Alphacoronavirus within the Coronaviridae virus family. Following the disastrous outbreaks in Asia and the United States, PEDV has been detected also in Europe. In order to better understand the overall situation, the molecular epidemiology, and factors that might influence the most variable disease impact; 40 samples from swine feces were collected from different PED outbreaks in Germany and other European countries and sequenced by shot-gun next-generation sequencing. A total of 38 new PEDV complete coding sequences were generated. When compared on a global scale, all investigated sequences from Central and South-Eastern Europe formed a rather homogeneous PEDV S INDEL cluster, suggesting a recent re-introduction. However, in-detail analyses revealed two new clusters and putative ancestor strains. Based on the available background data, correlations between clusters and location, farm type or clinical presentation could not be established. Additionally, the impact of secondary infections was explored using the metagenomic data sets. While several coinfections were observed, no correlation was found with disease courses. However, in addition to the PEDV genomes, ten complete viral coding sequences from nine different data sets were reconstructed each representing new virus strains. In detail, three pasivirus A strains, two astroviruses, a porcine sapelovirus, a kobuvirus, a porcine torovirus, a posavirus, and an enterobacteria phage were almost fully sequenced.
Controllability of Deterministic Networks with the Identical Degree Sequence

PubMed Central

Ma, Xiujuan; Zhao, Haixing; Wang, Binghong

2015-01-01

Controlling complex network is an essential problem in network science and engineering. Recent advances indicate that the controllability of complex network is dependent on the network's topology. Liu and Barabási, et.al speculated that the degree distribution was one of the most important factors affecting controllability for arbitrary complex directed network with random link weights. In this paper, we analysed the effect of degree distribution to the controllability for the deterministic networks with unweighted and undirected. We introduce a class of deterministic networks with identical degree sequence, called (x,y)-flower. We analysed controllability of the two deterministic networks ((1, 3)-flower and (2, 2)-flower) by exact controllability theory in detail and give accurate results of the minimum number of driver nodes for the two networks. In simulation, we compare the controllability of (x,y)-flower networks. Our results show that the family of (x,y)-flower networks have the same degree sequence, but their controllability is totally different. So the degree distribution itself is not sufficient to characterize the controllability of deterministic networks with unweighted and undirected. PMID:26020920
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

PubMed Central

Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J.; Davenport, Karen W.; Bishop-Lilly, Kimberly A.; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P.; Chain, Patrick S.G.

2017-01-01

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. PMID:27899609

Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform

DOE PAGES

Li, Po-E; Lo, Chien -Chi; Anderson, Joseph J.; ...

2016-11-24

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the easemore » of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. As a result, this bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research.« less
Molecular characterization of Giardia psittaci by multilocus sequence analysis.

PubMed

Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

2012-12-01

Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.
Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

PubMed

Raghav, Sunil Kumar; Deplancke, Bart

2012-01-01

Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.
Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

PubMed Central

Parker, Jennifer K.; Havird, Justin C.

2012-01-01

Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops. PMID:22194287
Differentiation of Xylella fastidiosa strains via multilocus sequence analysis of environmentally mediated genes (MLSA-E).

PubMed

Parker, Jennifer K; Havird, Justin C; De La Fuente, Leonardo

2012-03-01

Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops.
The Advanced Glaucoma Intervention Study (AGIS): 12. Baseline risk factors for sustained loss of visual field and visual acuity in patients with advanced glaucoma.

PubMed

2002-10-01

To examine the relationships between baseline risk factors and sustained decrease of visual field (SDVF) and sustained decrease of visual acuity (SDVA). Cohort study of participants in the Advanced Glaucoma Intervention Study (AGIS). This multicenter study enrolled patients between 1988 and 1992 and followed them until 2001; 789 eyes of 591 patients with advanced glaucoma were randomly assigned to one of two surgical sequences, argon laser trabeculoplasty (ALT)-trabeculectomy-trabeculectomy (ATT) or trabeculectomy-ALT-trabeculectomy (TAT). This report is based on data from 747 eyes. Eyes were offered the next intervention in the sequence upon failure of the previous intervention. Failure was based on recurrent intraocular pressure elevation, visual field defect, and disk rim criteria. Study visits occurred every 6 months; potential follow-up ranged from 8 to 13 years. For each intervention sequence, Cox multiple regression analyses were used to examine the baseline characteristics for association with two vision outcomes: SDVF and SDVA. The magnitude of the association is measured by the hazard ratio (HR), where HR for binary variables is the relative change in the hazard (or risk) of the outcome in eyes with the factor divided by the hazard in eyes without the factor, and HR for continuous variables is the relative change in the hazard (or risk) of the outcome in eyes with a unit increase in the factor. Characteristics associated with increased SDVF risk in the ATT sequence are: less baseline visual field defect (hazard ratio [HR] = 0.86, P <.001, 95% CI = 0.82-0.90), male gender (HR = 2.23, P <.001, 1.54-3.23), and worse baseline visual acuity (HR = 0.96, P =.001, 0.94-0.98); in the TAT sequence: less baseline visual field defect (HR = 0.93, P =.001, 0.89-0.97) and diabetes (HR = 1.87, P =.007, 1.18-2.97). Characteristics associated with increased SDVA risk in both treatment sequences are better baseline acuity (ATT: HR = 1.05, P <.001, 1.02-1.09; TAT: HR = 1.06, P <.001, 1.03-1.08), older age (ATT: HR = 1.05, P =.001, 1.02-1.08; TAT: HR = 1.04, P =.002, 1.01-1.06), and less formal education (ATT: HR = 1.92, P =.001, 1.29-2.88; TAT: HR = 1.77, P =.002, 1.22-2.54). For SDVF, risk factors were better baseline visual field in both treatment sequences, male gender, and worse baseline visual acuity in the ATT sequence, and diabetes in the TAT sequence. For SDVA, risk factors in both treatment sequences were better baseline visual acuity, older age, and less formal education.
WRKY transcription factor genes in wild rice Oryza nivara.

PubMed

Xu, Hengjian; Watanabe, Kenneth A; Zhang, Liyuan; Shen, Qingxi J

2016-08-01

The WRKY transcription factor family is one of the largest gene families involved in plant development and stress response. Although many WRKY genes have been studied in cultivated rice (Oryza sativa), the WRKY genes in the wild rice species Oryza nivara, the direct progenitor of O. sativa, have not been studied. O. nivara shows abundant genetic diversity and elite drought and disease resistance features. Herein, a total of 97 O. nivara WRKY (OnWRKY) genes were identified. RNA-sequencing demonstrates that OnWRKY genes were generally expressed at higher levels in the roots of 30-day-old plants. Bioinformatic analyses suggest that most of OnWRKY genes could be induced by salicylic acid, abscisic acid, and drought. Abundant potential MAPK phosphorylation sites in OnWRKYs suggest that activities of most OnWRKYs can be regulated by phosphorylation. Phylogenetic analyses of OnWRKYs support a novel hypothesis that ancient group IIc OnWRKYs were the original ancestors of only some group IIc and group III WRKYs. The analyses also offer strong support that group IIc OnWRKYs containing the HVE sequence in their zinc finger motifs were derived from group Ia WRKYs. This study provides a solid foundation for the study of the evolution and functions of WRKY genes in O. nivara. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

PubMed

Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

2014-01-01

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).
DoOPSearch: a web-based tool for finding and analysing common conserved motifs in the promoter regions of different chordate and plant genes

PubMed Central

Sebestyén, Endre; Nagy, Tibor; Suhai, Sándor; Barta, Endre

2009-01-01

Background The comparative genomic analysis of a large number of orthologous promoter regions of the chordate and plant genes from the DoOP databases shows thousands of conserved motifs. Most of these motifs differ from any known transcription factor binding site (TFBS). To identify common conserved motifs, we need a specific tool to be able to search amongst them. Since conserved motifs from the DoOP databases are linked to genes, the result of such a search can give a list of genes that are potentially regulated by the same transcription factor(s). Results We have developed a new tool called DoOPSearch for the analysis of the conserved motifs in the promoter regions of chordate or plant genes. We used the orthologous promoters of the DoOP database to extract thousands of conserved motifs from different taxonomic groups. The advantage of this approach is that different sets of conserved motifs might be found depending on how broad the taxonomic coverage of the underlying orthologous promoter sequence collection is (consider e.g. primates vs. mammals or Brassicaceae vs. Viridiplantae). The DoOPSearch tool allows the users to search these motif collections or the promoter regions of DoOP with user supplied query sequences or any of the conserved motifs from the DoOP database. To find overrepresented gene ontologies, the gene lists obtained can be analysed further using a modified version of the GeneMerge program. Conclusion We present here a comparative genomics based promoter analysis tool. Our system is based on a unique collection of conserved promoter motifs characteristic of different taxonomic groups. We offer both a command line and a web-based tool for searching in these motif collections using user specified queries. These can be either short promoter sequences or consensus sequences of known transcription factor binding sites. The GeneMerge analysis of the search results allows the user to identify statistically overrepresented Gene Ontology terms that might provide a clue on the function of the motifs and genes. PMID:19534755
Origin of the Y genome in Elymus and its relationship to other genomes in Triticeae based on evidence from elongation factor G (EF-G) gene sequences.

PubMed

Sun, Genlou; Komatsuda, Takao

2010-08-01

It is well known that Elymus arose through hybridization between representatives of different genera. Cytogenetic analyses show that all its members include the St genome in combination with one or more of four other genomes, the H, Y, P, and W genomes. The origins of the H, P, and W genomes are known, but not for the Y genome. We analyzed the single copy nuclear gene coding for elongation factor G (EF-G) from 28 accessions of polyploid Elymus species and 45 accessions of diploid Triticeae species in order to investigate origin of the Y genome and its relationship to other genomes in the tribe Triticeae. Sequence comparisons among the St, H, Y, P, W, and E genomes detected genome-specific polymorphisms at 66 nucleotide positions. The St and Y genomes are relatively dissimilar. The phylogeny of the Y genome sequences was investigated for the first time. They were most similar to the W genome sequences. The Y genome sequences were placed in two different groups. These two groups were included in an unresolved clade that included the W and E sequences as well as sequences from many annual species. The H genomes sequences were in a clade with the F, P, and Ns genome sequences as sister groups. These two clades were more closely related to each other and to the L and Xp genomes than they were to the St genome sequences. These data support the hypothesis that the Y genome evolved in a diploid species and has a different origin from the St genome. Copyright 2010 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Thorell, Kaisa; Hosseini, Shaghayegh; Palacios Gonzales, Reyna Victoria Palacios

In this study, Helicobacter pylori (H. pylori) is one of the most common bacterial infections in humans and this infection can lead to gastric ulcers and gastric cancer. H. pylori is one of the most genetically variable human pathogens and the ability of the bacterium to bind to the host epithelium as well as the presence of different virulence factors and genetic variants within these genes have been associated with disease severity. Nicaragua has particularly high gastric cancer incidence and we therefore studied Nicaraguan clinical H. pylori isolates for factors that could contribute to cancer risk. The complete genomes ofmore » fifty-two Nicaraguan H. pylorii isolates were sequenced and assembled de novo, and phylogenetic and virulence factor analyses were performed. The Nicaraguan isolates showed phylogenetic relationship with West African isolates in whole-genome sequence comparisons and with Western and urban South-and Central American isolates using MLSA (Multi-locus sequence analysis). A majority, 77 % of the isolates carried the cancer-associated virulence gene cagA and also the s1/i1/m1 vacuolating cytotoxin, vacA allele combination, which is linked to increased severity of disease. Specifically, we also found that Nicaraguan isolates have a blood group-binding adhesin (BabA) variant highly similar to previously reported BabA sequences from Latin America, including from isolates belonging to other phylogenetic groups. These BabA sequences were found to be under positive selection at several amino acid positions that differed from the global collection of isolates. In conclusion, the discovery of a Latin American BabA variant, independent of overall phylogenetic background, suggests hitherto unknown host or environmental factors within the Latin American population giving H. pylori isolates carrying this adhesin variant a selective advantage, which could affect pathogenesis and risk for sequelae through specific adherence properties.« less
Sequence variations of the partially dominant DELLA gene Rht-B1c in wheat and their functional impacts

PubMed Central

Ma, Zhengqiang

2013-01-01

Rht-B1c, allelic to the DELLA protein-encoding gene Rht-B1a, is a natural mutation documented in common wheat (Triticum aestivum). It confers variation to a number of traits related to cell and plant morphology, seed dormancy, and photosynthesis. The present study was conducted to examine the sequence variations of Rht-B1c and their functional impacts. The results showed that Rht-B1c was partially dominant or co-dominant for plant height, and exhibited an increased dwarfing effect. At the sequence level, Rht-B1c differed from Rht-B1a by one 2kb Veju retrotransposon insertion, three coding region single nucleotide polymorphisms (SNPs), one 197bp insertion, and four SNPs in the 1kb upstream sequence. Haplotype investigations, association analyses, transient expression assays, and expression profiling showed that the Veju insertion was primarily responsible for the extreme dwarfing effect. It was found that the Veju insertion changed processing of the Rht-B1c transcripts and resulted in DELLA motif primary structure disruption. Expression assays showed that Rht-B1c caused reduction of total Rht-1 transcript levels, and up-regulation of GATA-like transcription factors and genes positively regulated by these factors, suggesting that one way in which Rht-1 proteins affect plant growth and development is through GATA-like transcription factor regulation. PMID:23918966
Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses.

PubMed

Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

2016-11-26

Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics.
Analyses of Evolutionary Characteristics of the Hemagglutinin-Esterase Gene of Influenza C Virus during a Period of 68 Years Reveals Evolutionary Patterns Different from Influenza A and B Viruses

PubMed Central

Furuse, Yuki; Matsuzaki, Yoko; Nishimura, Hidekazu; Oshitani, Hitoshi

2016-01-01

Infections with the influenza C virus causing respiratory symptoms are common, particularly among children. Since isolation and detection of the virus are rarely performed, compared with influenza A and B viruses, the small number of available sequences of the virus makes it difficult to analyze its evolutionary dynamics. Recently, we reported the full genome sequence of 102 strains of the virus. Here, we exploited the data to elucidate the evolutionary characteristics and phylodynamics of the virus compared with influenza A and B viruses. Along with our data, we obtained public sequence data of the hemagglutinin-esterase gene of the virus; the dataset consists of 218 unique sequences of the virus collected from 14 countries between 1947 and 2014. Informatics analyses revealed that (1) multiple lineages have been circulating globally; (2) there have been weak and infrequent selective bottlenecks; (3) the evolutionary rate is low because of weak positive selection and a low capability to induce mutations; and (4) there is no significant positive selection although a few mutations affecting its antigenicity have been induced. The unique evolutionary dynamics of the influenza C virus must be shaped by multiple factors, including virological, immunological, and epidemiological characteristics. PMID:27898037
Classification of viral zoonosis through receptor pattern analysis.

PubMed

Bae, Se-Eun; Son, Hyeon Seok

2011-04-13

Viral zoonosis, the transmission of a virus from its primary vertebrate reservoir species to humans, requires ubiquitous cellular proteins known as receptor proteins. Zoonosis can occur not only through direct transmission from vertebrates to humans, but also through intermediate reservoirs or other environmental factors. Viruses can be categorized according to genotype (ssDNA, dsDNA, ssRNA and dsRNA viruses). Among them, the RNA viruses exhibit particularly high mutation rates and are especially problematic for this reason. Most zoonotic viruses are RNA viruses that change their envelope proteins to facilitate binding to various receptors of host species. In this study, we sought to predict zoonotic propensity through the analysis of receptor characteristics. We hypothesized that the major barrier to interspecies virus transmission is that receptor sequences vary among species--in other words, that the specific amino acid sequence of the receptor determines the ability of the viral envelope protein to attach to the cell. We analysed host-cell receptor sequences for their hydrophobicity/hydrophilicity characteristics. We then analysed these properties for similarities among receptors of different species and used a statistical discriminant analysis to predict the likelihood of transmission among species. This study is an attempt to predict zoonosis through simple computational analysis of receptor sequence differences. Our method may be useful in predicting the zoonotic potential of newly discovered viral strains.
SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

NASA Astrophysics Data System (ADS)

Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

2015-09-01

The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.
[Convergent origin of repeats in genes coding for globular proteins. An analysis of the factors determining the presence of inverted and symmetrical repeats].

PubMed

Solov'ev, V V; Kel', A E; Kolchanov, N A

1989-01-01

The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

PubMed

Nair, Pradeep S; John, Eugene B

2007-01-01

Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.
Targeted sequencing for high-resolution evolutionary analyses following genome duplication in salmonid fish: Proof of concept for key components of the insulin-like growth factor axis.

PubMed

Lappin, Fiona M; Shaw, Rebecca L; Macqueen, Daniel J

2016-12-01

High-throughput sequencing has revolutionised comparative and evolutionary genome biology. It has now become relatively commonplace to generate multiple genomes and/or transcriptomes to characterize the evolution of large taxonomic groups of interest. Nevertheless, such efforts may be unsuited to some research questions or remain beyond the scope of some research groups. Here we show that targeted high-throughput sequencing offers a viable alternative to study genome evolution across a vertebrate family of great scientific interest. Specifically, we exploited sequence capture and Illumina sequencing to characterize the evolution of key components from the insulin-like growth (IGF) signalling axis of salmonid fish at unprecedented phylogenetic resolution. The IGF axis represents a central governor of vertebrate growth and its core components were expanded by whole genome duplication in the salmonid ancestor ~95Ma. Using RNA baits synthesised to genes encoding the complete family of IGF binding proteins (IGFBP) and an IGF hormone (IGF2), we captured, sequenced and assembled orthologous and paralogous exons from species representing all ten salmonid genera. This approach generated 299 novel sequences, most as complete or near-complete protein-coding sequences. Phylogenetic analyses confirmed congruent evolutionary histories for all nineteen recognized salmonid IGFBP family members and identified novel salmonid-specific IGF2 paralogues. Moreover, we reconstructed the evolution of duplicated IGF axis paralogues across a replete salmonid phylogeny, revealing complex historic selection regimes - both ancestral to salmonids and lineage-restricted - that frequently involved asymmetric paralogue divergence under positive and/or relaxed purifying selection. Our findings add to an emerging literature highlighting diverse applications for targeted sequencing in comparative-evolutionary genomics. We also set out a viable approach to obtain large sets of nuclear genes for any member of the salmonid family, which should enable insights into the evolutionary role of whole genome duplication before additional nuclear genome sequences become available. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Identification of a Latin American-specific BabA adhesin variant through whole genome sequencing of Helicobacter pylori patient isolates from Nicaragua

DOE PAGES

Thorell, Kaisa; Hosseini, Shaghayegh; Palacios Gonzales, Reyna Victoria Palacios; ...

2016-02-29

In this study, Helicobacter pylori (H. pylori) is one of the most common bacterial infections in humans and this infection can lead to gastric ulcers and gastric cancer. H. pylori is one of the most genetically variable human pathogens and the ability of the bacterium to bind to the host epithelium as well as the presence of different virulence factors and genetic variants within these genes have been associated with disease severity. Nicaragua has particularly high gastric cancer incidence and we therefore studied Nicaraguan clinical H. pylori isolates for factors that could contribute to cancer risk. The complete genomes ofmore » fifty-two Nicaraguan H. pylorii isolates were sequenced and assembled de novo, and phylogenetic and virulence factor analyses were performed. The Nicaraguan isolates showed phylogenetic relationship with West African isolates in whole-genome sequence comparisons and with Western and urban South-and Central American isolates using MLSA (Multi-locus sequence analysis). A majority, 77 % of the isolates carried the cancer-associated virulence gene cagA and also the s1/i1/m1 vacuolating cytotoxin, vacA allele combination, which is linked to increased severity of disease. Specifically, we also found that Nicaraguan isolates have a blood group-binding adhesin (BabA) variant highly similar to previously reported BabA sequences from Latin America, including from isolates belonging to other phylogenetic groups. These BabA sequences were found to be under positive selection at several amino acid positions that differed from the global collection of isolates. In conclusion, the discovery of a Latin American BabA variant, independent of overall phylogenetic background, suggests hitherto unknown host or environmental factors within the Latin American population giving H. pylori isolates carrying this adhesin variant a selective advantage, which could affect pathogenesis and risk for sequelae through specific adherence properties.« less

RhoA Regulation of Cardiomyocyte Differentiation

PubMed Central

Kaarbø, Mari; Crane, Denis I.; Murrell, Wayne G.

2013-01-01

Earlier findings from our laboratory implicated RhoA in heart developmental processes. To investigate factors that potentially regulate RhoA expression, RhoA gene organisation and promoter activity were analysed. Comparative analysis indicated strict conservation of both gene organisation and coding sequence of the chick, mouse, and human RhoA genes. Bioinformatics analysis of the derived promoter region of mouse RhoA identified putative consensus sequence binding sites for several transcription factors involved in heart formation and organogenesis generally. Using luciferase reporter assays, RhoA promoter activity was shown to increase in mouse-derived P19CL6 cells that were induced to differentiate into cardiomyocytes. Overexpression of a dominant negative mutant of mouse RhoA (mRhoAN19) blocked this cardiomyocyte differentiation of P19CL6 cells and led to the accumulation of the cardiac transcription factors SRF and GATA4 and the early cardiac marker cardiac α-actin. Taken together, these findings indicate a fundamental role for RhoA in the differentiation of cardiomyocytes. PMID:23935420
Systematics of Cladophora spp. (Chlorophyta) from North Carolina, USA, based upon morphology and DNA sequence data with a description of Cladophora subtilissima sp. nov.

PubMed

Taylor, Robin L; Bailey, Jeffrey Craig; Freshwater, David Wilson

2017-06-01

Identification of Cladophora species is challenging due to conservation of gross morphology, few discrete autapomorphies, and environmental influences on morphology. Twelve species of marine Cladophora were reported from North Carolina waters. Cladophora specimens were collected from inshore and offshore marine waters for DNA sequence and morphological analyses. The nuclear-encoded rRNA internal transcribed spacer regions (ITS) were sequenced for 105 specimens and used in molecular assisted identification. The ITS1 and ITS2 region was highly variable, and sequences were sorted into ITS Sets of Alignable Sequences (SASs). Sequencing of short hyper-variable ITS1 sections from Cladophora type specimens was used to positively identify species represented by SASs when the types were made available. Secondary structures for the ITS1 locus were also predicted for each specimen and compared to predicted structures from Cladophora sequences available in GenBank. Nine ITS SASs were identified and representative specimens chosen for phylogenetic analyses of 18S and 28S rRNA gene sequences to reveal relationships with other Cladophora species. Phylogenetic analyses indicated that marine Cladophorales were polyphyletic and separated into two clades, the Cladophora clade and the "Siphonocladales" clade. Morphological analyses were performed to assess the consistency of character states within species, and complement the DNA sequence analyses. These analyses revealed intra- and interspecific character state variation, and that combined molecular and morphological analyses were required for the identification of species. One new report, Cladophora dotyana, and one new species Cladophora subtilissima sp. nov., were revealed, and increased the biodiversity of North Carolina marine Cladophora to 14 species. © 2017 Phycological Society of America.
Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

PubMed

Mackey, Aaron J; Pearson, William R

2004-10-01

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Insights into rubber biosynthesis from transcriptome analysis of Hevea brasiliensis latex.

PubMed

Chow, Keng-See; Wan, Kiew-Lian; Isa, Mohd Noor Mat; Bahari, Azlina; Tan, Siang-Hee; Harikrishna, K; Yeang, Hoong-Yeet

2007-01-01

Hevea brasiliensis is the most widely cultivated species for commercial production of natural rubber (cis-polyisoprene). In this study, 10,040 expressed sequence tags (ESTs) were generated from the latex of the rubber tree, which represents the cytoplasmic content of a single cell type, in order to analyse the latex transcription profile with emphasis on rubber biosynthesis-related genes. A total of 3,441 unique transcripts (UTs) were obtained after quality editing and assembly of EST sequences. Functional classification of UTs according to the Gene Ontology convention showed that 73.8% were related to genes of unknown function. Among highly expressed ESTs, a significant proportion encoded proteins related to rubber biosynthesis and stress or defence responses. Sequences encoding rubber particle membrane proteins (RPMPs) belonging to three protein families accounted for 12% of the ESTs. Characterization of these ESTs revealed nine RPMP variants (7.9-27 kDa) including the 14 kDa REF (rubber elongation factor) and 22 kDa SRPP (small rubber particle protein). The expression of multiple RPMP isoforms in latex was shown using antibodies against REF and SRPP. Both EST and quantitative reverse transcription-PCR (QRT-PCR) analyses demonstrated REF and SRPP to be the most abundant transcripts in latex. Besides rubber biosynthesis, comparative sequence analysis showed that the RPMPs are highly similar to sequences in the plant kingdom having stress-related functions. Implications of the RPMP function in cis-polyisoprene biosynthesis in the context of transcript abundance and differential gene expression are discussed.
Leptospiral Pathogenomics

PubMed Central

Lehmann, Jason S.; Matthias, Michael A.; Vinetz, Joseph M.; Fouts, Derrick E.

2014-01-01

Leptospirosis, caused by pathogenic spirochetes belonging to the genus Leptospira, is a zoonosis with important impacts on human and animal health worldwide. Research on the mechanisms of Leptospira pathogenesis has been hindered due to slow growth of infectious strains, poor transformability, and a paucity of genetic tools. As a result of second generation sequencing technologies, there has been an acceleration of leptospiral genome sequencing efforts in the past decade, which has enabled a concomitant increase in functional genomics analyses of Leptospira pathogenesis. A pathogenomics approach, by coupling of pan-genomic analysis of multiple isolates with sequencing of experimentally attenuated highly pathogenic Leptospira, has resulted in the functional inference of virulence factors. The global Leptospira Genome Project supported by the U.S. National Institute of Allergy and Infectious Diseases to which key scientific contributions have been made from the international leptospirosis research community has provided a new roadmap for comprehensive studies of Leptospira and leptospirosis well into the future. This review describes functional genomics approaches to apply the data generated by the Leptospira Genome Project towards deepening our knowledge of virulence factors of Leptospira using the emerging discipline of pathogenomics. PMID:25437801
Uncovering the Salt Response of Soybean by Unraveling Its Wild and Cultivated Functional Genomes Using Tag Sequencing

PubMed Central

Ali, Zulfiqar; Zhang, Da Yong; Xu, Zhao Long; Xu, Ling; Yi, Jin Xin; He, Xiao Lan; Huang, Yi Hong; Liu, Xiao Qing; Khan, Asif Ali; Trethowan, Richard M.; Ma, Hong Xiang

2012-01-01

Soil salinity has very adverse effects on growth and yield of crop plants. Several salt tolerant wild accessions and cultivars are reported in soybean. Functional genomes of salt tolerant Glycine soja and a salt sensitive genotype of Glycine max were investigated to understand the mechanism of salt tolerance in soybean. For this purpose, four libraries were constructed for Tag sequencing on Illumina platform. We identify around 490 salt responsive genes which included a number of transcription factors, signaling proteins, translation factors and structural genes like transporters, multidrug resistance proteins, antiporters, chaperons, aquaporins etc. The gene expression levels and ratio of up/down-regulated genes was greater in tolerant plants. Translation related genes remained stable or showed slightly higher expression in tolerant plants under salinity stress. Further analyses of sequenced data and the annotations for gene ontology and pathways indicated that soybean adapts to salt stress through ABA biosynthesis and regulation of translation and signal transduction of structural genes. Manipulation of these pathways may mitigate the effect of salt stress thus enhancing salt tolerance. PMID:23209559
TEMPLE: analysing population genetic variation at transcription factor binding sites.

PubMed

Litovchenko, Maria; Laurent, Stefan

2016-11-01

Genetic variation occurring at the level of regulatory sequences can affect phenotypes and fitness in natural populations. This variation can be analysed in a population genetic framework to study how genetic drift and selection affect the evolution of these functional elements. However, doing this requires a good understanding of the location and nature of regulatory regions and has long been a major hurdle. The current proliferation of genomewide profiling experiments of transcription factor occupancies greatly improves our ability to identify genomic regions involved in specific DNA-protein interactions. Although software exists for predicting transcription factor binding sites (TFBS), and the effects of genetic variants on TFBS specificity, there are no tools currently available for inferring this information jointly with the genetic variation at TFBS in natural populations. We developed the software Transcription Elements Mapping at the Population LEvel (TEMPLE), which predicts TFBS, evaluates the effects of genetic variants on TFBS specificity and summarizes the genetic variation occurring at TFBS in intraspecific sequence alignments. We demonstrate that TEMPLE's TFBS prediction algorithms gives identical results to PATSER, a software distribution commonly used in the field. We also illustrate the unique features of TEMPLE by analysing TFBS diversity for the TF Senseless (SENS) in one ancestral and one cosmopolitan population of the fruit fly Drosophila melanogaster. TEMPLE can be used to localize TFBS that are characterized by strong genetic differentiation across natural populations. This will be particularly useful for studies aiming to identify adaptive mutations. TEMPLE is a java-based cross-platform software that easily maps the genetic diversity at predicted TFBSs using a graphical interface, or from the Unix command line. © 2016 John Wiley & Sons Ltd.
Vander Lugt correlation of DNA sequence data

NASA Astrophysics Data System (ADS)

Christens-Barry, William A.; Hawk, James F.; Martin, James C.

1990-12-01

DNA, the molecule containing the genetic code of an organism, is a linear chain of subunits. It is the sequence of subunits, of which there are four kinds, that constitutes the unique blueprint of an individual. This sequence is the focus of a large number of analyses performed by an army of geneticists, biologists, and computer scientists. Most of these analyses entail searches for specific subsequences within the larger set of sequence data. Thus, most analyses are essentially pattern recognition or correlation tasks. Yet, there are special features to such analysis that influence the strategy and methods of an optical pattern recognition approach. While the serial processing employed in digital electronic computers remains the main engine of sequence analyses, there is no fundamental reason that more efficient parallel methods cannot be used. We describe an approach using optical pattern recognition (OPR) techniques based on matched spatial filtering. This allows parallel comparison of large blocks of sequence data. In this study we have simulated a Vander Lugt1 architecture implementing our approach. Searches for specific target sequence strings within a block of DNA sequence from the Co/El plasmid2 are performed.
Childhood maternal care is associated with DNA methylation of the genes for brain-derived neurotrophic factor (BDNF) and oxytocin receptor (OXTR) in peripheral blood cells in adult men and women.

PubMed

Unternaehrer, Eva; Meyer, Andrea Hans; Burkhardt, Susan C A; Dempster, Emma; Staehli, Simon; Theill, Nathan; Lieb, Roselind; Meinlschmidt, Gunther

2015-01-01

In adults, reporting low and high maternal care in childhood, we compared DNA methylation in two stress-associated genes (two target sequences in the oxytocin receptor gene, OXTR; one in the brain-derived neurotrophic factor gene, BDNF) in peripheral whole blood, in a cross-sectional study (University of Basel, Switzerland) during 2007-2008. We recruited 89 participants scoring < 27 (n = 47, 36 women) or > 33 (n = 42, 35 women) on the maternal care subscale of the Parental Bonding Instrument (PBI) at a previous assessment of a larger group (N = 709, range PBI maternal care = 0-36, age range = 19-66 years; median 24 years). 85 participants gave blood for DNA methylation analyses (Sequenom(R) EpiTYPER, San Diego, CA) and cell count (Sysmex PocH-100i™, Kobe, Japan). Mixed model statistical analysis showed greater DNA methylation in the low versus high maternal care group, in the BDNF target sequence [Likelihood-Ratio (1) = 4.47; p = 0.035] and in one OXTR target sequence Likelihood-Ratio (1) = 4.33; p = 0.037], but not the second OXTR target sequence [Likelihood-Ratio (1) < 0.001; p = 0.995). Mediation analyses indicated that differential blood cell count did not explain associations between low maternal care and BDNF (estimate = -0.005, 95% CI = -0.025 to 0.015; p = 0.626) or OXTR DNA methylation (estimate = -0.015, 95% CI = -0.038 to 0.008; p = 0.192). Hence, low maternal care in childhood was associated with greater DNA methylation in an OXTR and a BDNF target sequence in blood cells in adulthood. Although the study has limitations (cross-sectional, a wide age range, only three target sequences in two genes studied, small effects, uncertain relevance of changes in blood cells to gene methylation in brain), the findings may indicate components of the epiphenotype from early life stress.
Preservation of protein clefts in comparative models.

PubMed

Piedra, David; Lois, Sergi; de la Cruz, Xavier

2008-01-16

Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.
Osteoblast-specific factor 2: cloning of a putative bone adhesion protein with homology with the insect protein fasciclin I.

PubMed Central

Takeshita, S; Kikuno, R; Tezuka, K; Amann, E

1993-01-01

A cDNA library prepared from the mouse osteoblastic cell line MC3T3-E1 was screened for the presence of specifically expressed genes by employing a combined subtraction hybridization/differential screening approach. A cDNA was identified and sequenced which encodes a protein designated osteoblast-specific factor 2 (OSF-2) comprising 811 amino acids. OSF-2 has a typical signal sequence, followed by a cysteine-rich domain, a fourfold repeated domain and a C-terminal domain. The protein lacks a typical transmembrane region. The fourfold repeated domain of OSF-2 shows homology with the insect protein fasciclin I. RNA analyses revealed that OSF-2 is expressed in bone and to a lesser extent in lung, but not in other tissues. Mouse OSF-2 cDNA was subsequently used as a probe to clone the human counterpart. Mouse and human OSF-2 show a high amino acid sequence conservation except for the signal sequence and two regions in the C-terminal domain in which 'in-frame' insertions or deletions are observed, implying alternative splicing events. On the basis of the amino acid sequence homology with fasciclin I, we suggest that OSF-2 functions as a homophilic adhesion molecule in bone formation. Images Figure 3 Figure 4 Figure 5 Figure 6 PMID:8363580
Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

PubMed

Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

2017-01-01

The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.
Molecular characterization and evolutionary insights into potential sex-determination genes in the western orchard predatory mite Metaseiulus occidentalis (Chelicerata: Arachnida: Acari: Phytoseiidae).

PubMed

Pomerantz, Aaron F; Hoy, Marjorie A; Kawahara, Akito Y

2015-01-01

Little is known about the process of sex determination at the molecular level in species belonging to the subclass Acari, a taxon of arachnids that contains mites and ticks. The recent sequencing of the transcriptome and genome of the western orchard predatory mite Metaseiulus occidentalis allows investigation of molecular mechanisms underlying the biological processes of sex determination in this predator of phytophagous pest mites. We identified four doublesex-and-mab-3-related transcription factor (dmrt) genes, one transformer-2 gene, one intersex gene, and two fruitless-like genes in M. occidentalis. Phylogenetic analyses were conducted to infer the molecular relationships to sequences from species of arthropods, including insects, crustaceans, acarines, and a centipede, using available genomic data. Comparative analyses revealed high sequence identity within functional domains and confirmed that the architecture for certain sex-determination genes is conserved in arthropods. This study provides a framework for identifying potential target genes that could be implicated in the process of sex determination in M. occidentalis and provides insight into the conservation and change of the molecular components of sex determination in arthropods.
Retention-error patterns in complex alphanumeric serial-recall tasks.

PubMed

Mathy, Fabien; Varré, Jean-Stéphane

2013-01-01

We propose a new method based on an algorithm usually dedicated to DNA sequence alignment in order to both reliably score short-term memory performance on immediate serial-recall tasks and analyse retention-error patterns. There can be considerable confusion on how performance on immediate serial list recall tasks is scored, especially when the to-be-remembered items are sampled with replacement. We discuss the utility of sequence-alignment algorithms to compare the stimuli to the participants' responses. The idea is that deletion, substitution, translocation, and insertion errors, which are typical in DNA, are also typical putative errors in short-term memory (respectively omission, confusion, permutation, and intrusion errors). We analyse four data sets in which alphanumeric lists included a few (or many) repetitions. After examining the method on two simple data sets, we show that sequence alignment offers 1) a compelling method for measuring capacity in terms of chunks when many regularities are introduced in the material (third data set) and 2) a reliable estimator of individual differences in short-term memory capacity. This study illustrates the difficulty of arriving at a good measure of short-term memory performance, and also attempts to characterise the primary factors underpinning remembering and forgetting.
Enabling the democratization of the genomics revolution with a fully integrated web-based bioinformatics platform.

PubMed

Li, Po-E; Lo, Chien-Chi; Anderson, Joseph J; Davenport, Karen W; Bishop-Lilly, Kimberly A; Xu, Yan; Ahmed, Sanaa; Feng, Shihai; Mokashi, Vishwesh P; Chain, Patrick S G

2017-01-09

Continued advancements in sequencing technologies have fueled the development of new sequencing applications and promise to flood current databases with raw data. A number of factors prevent the seamless and easy use of these data, including the breadth of project goals, the wide array of tools that individually perform fractions of any given analysis, the large number of associated software/hardware dependencies, and the detailed expertise required to perform these analyses. To address these issues, we have developed an intuitive web-based environment with a wide assortment of integrated and cutting-edge bioinformatics tools in pre-configured workflows. These workflows, coupled with the ease of use of the environment, provide even novice next-generation sequencing users with the ability to perform many complex analyses with only a few mouse clicks and, within the context of the same environment, to visualize and further interrogate their results. This bioinformatics platform is an initial attempt at Empowering the Development of Genomics Expertise (EDGE) in a wide range of applications for microbial research. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Discordant genetic diversity and geographic patterns between Crassicutis cichlasomae (Digenea: Apocreadiidae) and its cichlid host, "Cichlasoma" urophthalmus (Osteichthyes: Cichlidae), in Middle-America.

PubMed

Razo-Mendivil, Ulises; Vázquez-Domínguez, Ella; de León, Gerardo Pérez-Ponce

2013-12-01

Genetic analyses of hosts and their parasites are key to understand the evolutionary patterns and processes that have shaped host-parasite associations. We evaluated the genetic structure of the digenean Crassicutis cichlasomae and its most common host, the Mayan cichlid "Cichlasoma" urophthalmus, encompassing most of their geographical range in Middle-America (river basins in southeastern Mexico, Belize, and Guatemala together with the Yucatan Peninsula). Genetic diversity and structure analyses were done based on 167 cytochrome c oxidase subunit 1 sequences (330 bp) for C. cichlasomae from 21 populations and 161 cytochrome b sequences (599 bp) for "C." urophthalmus from 26 populations. Analyses performed included phylogenetic tree estimation under Bayesian inference and maximum likelihood analysis, genetic diversity, distance and structure estimates, haplotype networks, and demographic evaluations. Crassicutis cichlasomae showed high genetic diversity values and genetic structuring, corresponding with 4 groups clearly differentiated and highly divergent. Conversely, "C." urophthalmus showed low levels of genetic diversity and genetic differentiation, defined as 2 groups with low divergence and with no correspondence with geographical distribution. Our results show that species of cichlids parasitized by C. cichlasomae other than "C." urophthalmus, along with multiple colonization events and subsequent isolation in different basins, are likely factors that shaped the genetic structure of the parasite. Meanwhile, historical long-distance dispersal and drought periods during the Holocene, with significant population size reductions and fragmentations, are factors that could have shaped the genetic structure of the Mayan cichlid.
The Eucalyptus grandis R2R3-MYB transcription factor family: evidence for woody growth-related evolution and function.

PubMed

Soler, Marçal; Camargo, Eduardo Leal Oliveira; Carocha, Victor; Cassan-Wang, Hua; San Clemente, Hélène; Savelli, Bruno; Hefer, Charles A; Paiva, Jorge A Pinto; Myburg, Alexander A; Grima-Pettenati, Jacqueline

2015-06-01

The R2R3-MYB family, one of the largest transcription factor families in higher plants, controls a wide variety of plant-specific processes including, notably, phenylpropanoid metabolism and secondary cell wall formation. We performed a genome-wide analysis of this superfamily in Eucalyptus, one of the most planted hardwood trees world-wide. A total of 141 predicted R2R3-MYB sequences identified in the Eucalyptus grandis genome sequence were subjected to comparative phylogenetic analyses with Arabidopsis thaliana, Oryza sativa, Populus trichocarpa and Vitis vinifera. We analysed features such as gene structure, conserved motifs and genome location. Transcript abundance patterns were assessed by RNAseq and validated by high-throughput quantitative PCR. We found some R2R3-MYB subgroups with expanded membership in E. grandis, V. vinifera and P. trichocarpa, and others preferentially found in woody species, suggesting diversification of specific functions in woody plants. By contrast, subgroups containing key genes regulating lignin biosynthesis and secondary cell wall formation are more conserved across all of the species analysed. In Eucalyptus, R2R3-MYB tandem gene duplications seem to disproportionately affect woody-preferential and woody-expanded subgroups. Interestingly, some of the genes belonging to woody-preferential subgroups show higher expression in the cambial region, suggesting a putative role in the regulation of secondary growth. © 2014 The Authors New Phytologist © 2014 New Phytologist Trust.
[Magnetic resonance in traumatic brain injury: A comparative study of the different conventional magnetic resonance imaging sequences and their diagnostic value in diffuse axonal injury].

PubMed

Cicuendez, Marta; Castaño-León, Ana; Ramos, Ana; Hilario, Amaya; Gómez, Pedro A; Lagares, Alfonso

To compare the identification capability of traumatic axonal injury (TAI) by different sequences on conventional magnetic resonance (MR) studies in traumatic brain injury (TBI) patients. We retropectevely analyzed 264 TBI patients to whom a MR had been performed in the first 60 days after trauma. All clinical variables related to prognosis were registered, as well as the data from the initial computed tomography. The MR imaging protocol consisted of a 3-plane localizer sequence T1-weighted and T2-weighted fast spin-echo, FLAIR and gradient-echo images (GRET2*). TAI lesions were classified according to Gentry and Firsching classifications. We calculated weighted kappa coefficients and the area under the ROC curve for each MR sequence. A multivariable analyses was performed to correlate MR findings in each sequence with the final outcome of the patients. TAI lesions were adequately visualized on T2, FLAIR and GRET2* sequences in more than 80% of the studies. Subcortical TAI lesions were well on FLAIR and GRET2* sequences visualized hemorrhagic TAI lesions. We saw that these MR sequences had a high inter-rater agreement for TAI diagnosis (0.8). T2 sequence presented the highest value on ROC curve in Gentry (0.68, 95%CI: 0.61-0.76, p<0.001, Nagerlkerke-R 2 0.26) and Firsching classifications (0.64, 95%CI 0.57-0.72, p<0.001, Nagerlkerke-R 2 0.19), followed by FLAIR and GRET2* sequences. Both classifications determined by each of these sequences were associated with poor outcome after performing a multivariable analyses adjusted for prognostic factors (p<0.02). We recommend to perform conventional MR study in subacute phase including T2, FLAIR and GRET2* sequences for visualize TAI lesions. These MR findings added prognostic information in TBI patients. Copyright © 2017 Sociedad Española de Neurocirugía. Publicado por Elsevier España, S.L.U. All rights reserved.
Analyses of transcriptome sequences reveal multiple ancient large-scale duplication events in the ancestor of Sphagnopsida (Bryophyta).

PubMed

Devos, Nicolas; Szövényi, Péter; Weston, David J; Rothfels, Carl J; Johnson, Matthew G; Shaw, A Jonathan

2016-07-01

The goal of this research was to investigate whether there has been a whole-genome duplication (WGD) in the ancestry of Sphagnum (peatmoss) or the class Sphagnopsida, and to determine if the timing of any such duplication(s) and patterns of paralog retention could help explain the rapid radiation and current ecological dominance of peatmosses. RNA sequencing (RNA-seq) data were generated for nine taxa in Sphagnopsida (Bryophyta). Analyses of frequency plots for synonymous substitutions per synonymous site (Ks ) between paralogous gene pairs and reconciliation of 578 gene trees were conducted to assess evidence of large-scale or genome-wide duplication events in each transcriptome. Both Ks frequency plots and gene tree-based analyses indicate multiple duplication events in the history of the Sphagnopsida. The most recent WGD event predates divergence of Sphagnum from the two other genera of Sphagnopsida. Duplicate retention is highly variable across species, which might be best explained by local adaptation. Our analyses indicate that the last WGD could have been an important factor underlying the diversification of peatmosses and facilitated their rise to ecological dominance in peatlands. The timing of the duplication events and their significance in the evolutionary history of peat mosses are discussed. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Discovery of Influenza A Virus Sequence Pairs and Their Combinations for Simultaneous Heterosubtypic Targeting that Hedge against Antiviral Resistance

PubMed Central

Lin, Jing; Pramono, Zacharias Aloysius Dwi; Maurer-Stroh, Sebastian

2016-01-01

The multiple circulating human influenza A virus subtypes coupled with the perpetual genomic mutations and segment reassortment events challenge the development of effective therapeutics. The capacity to drug most RNAs motivates the investigation on viral RNA targets. 123,060 segment sequences from 35,938 strains of the most prevalent subtypes also infecting humans–H1N1, 2009 pandemic H1N1, H3N2, H5N1 and H7N9, were used to identify 1,183 conserved RNA target sequences (≥15-mer) in the internal segments. 100% theoretical coverage in simultaneous heterosubtypic targeting is achieved by pairing specific sequences from the same segment (“Duals”) or from two segments (“Doubles”); 1,662 Duals and 28,463 Doubles identified. By combining specific Duals and/or Doubles to form a target graph wherein an edge connecting two vertices (target sequences) represents a Dual or Double, it is possible to hedge against antiviral resistance besides maintaining 100% heterosubtypic coverage. To evaluate the hedging potential, we define the hedge-factor as the minimum number of resistant target sequences that will render the graph to become resistant i.e. eliminate all the edges therein; a target sequence or a graph is considered resistant when it cannot achieve 100% heterosubtypic coverage. In an n-vertices graph (n ≥ 3), the hedge-factor is maximal (= n– 1) when it is a complete graph i.e. every distinct pair in a graph is either a Dual or Double. Computational analyses uncover an extensive number of complete graphs of different sizes. Monte Carlo simulations show that the mutation counts and time elapsed for a target graph to become resistant increase with the hedge-factor. Incidentally, target sequences which were reported to reduce virus titre in experiments are included in our target graphs. The identity of target sequence pairs for heterosubtypic targeting and their combinations for hedging antiviral resistance are useful toolkits to construct target graphs for different therapeutic objectives. PMID:26771381
High rates of recombination in otitis media isolates of non-typeable Haemophilus influenzae✩

PubMed Central

Cody, Alison J.; Field, Dawn; Feil, Edward J.; Stringer, Suzanna; Deadman, Mary E.; Tsolaki, Anthony G.; Gratz, Brett; Bouchet, Valérie; Goldstein, Richard; Hood, Derek W.; Moxon, E. Richard

2008-01-01

Non-typeable (NT) or capsule-deficient, Haemophilus influenzae (Hi) is a common commensal of the upper respiratory tract of humans and can be pathogenic resulting in diseases such as otitis media, sinusitis and pneumonia. The lipopolysaccharide (LPS) of NTHi is a major virulence factor that displays substantial intra-strain and inter-strain variation of its oligosaccharide structures. To investigate the genetic basis of LPS variation we sequenced internal regions of each of seven genes required for the biosynthesis of either the inner or the outer core oligosaccharide structures. These sequences were obtained from 25 representative NTHi isolates from episodes of otitis media. We found abundant evidence of recombination among LPS genes of NTHi, a finding in marked contrast to previous analyses of biosynthetic genes for capsular polysaccharide, a well-documented virulence factor of Hi. We found mosaic sequences, linkage equilibrium between loci and a lack of congruence between gene trees. These high rates were not confined to LPS genes since evidence for similar amounts of recombination was also found in eight housekeeping genes in a subset of the same 25 isolates. These findings provide a population based foundation for a better understanding of the role of NTHi LPS as a virulence factor and its potential as a candidate vaccine. PMID:12797973
Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops

PubMed Central

Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G.; Manners, John M.

2013-01-01

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis. PMID:23661484
A novel high-resolution multilocus sequence typing of Giardia intestinalis Assemblage A isolates reveals zoonotic transmission, clonal outbreaks and recombination.

PubMed

Ankarklev, Johan; Lebbad, Marianne; Einarsson, Elin; Franzén, Oscar; Ahola, Harri; Troell, Karin; Svärd, Staffan G

2018-06-01

Molecular epidemiology and genotyping studies of the parasitic protozoan Giardia intestinalis have proven difficult due to multiple factors, such as low discriminatory power in the commonly used genotyping loci, which has hampered molecular analyses of outbreak sources, zoonotic transmission and virulence types. Here we have focused on assemblage A Giardia and developed a high-resolution assemblage-specific multilocus sequence typing (MLST) method. Analyses of sequenced G. intestinalis assemblage A genomes from different sub-assemblages identified a set of six genetic loci with high genetic variability. DNA samples from both humans (n = 44) and animals (n = 18) that harbored Giardia assemblage A infections, were PCR amplified (557-700 bp products) and sequenced at the six novel genetic loci. Bioinformatic analyses showed five to ten-fold higher levels of polymorphic sites than what was previously found among assemblage A samples using the classic genotyping loci. Phylogenetically, a division of two major clusters in assemblage A became apparent, separating samples of human and animal origin. A subset of human samples (n = 9) from a documented Giardia outbreak in a Swedish day-care center, showed full complementarity at nine genetic loci (the six new and the standard BG, TPI and GDH loci), strongly suggesting one source of infection. Furthermore, three samples of human origin displayed MLST profiles that were phylogenetically more closely related to MLST profiles from animal derived samples, suggesting zoonotic transmission. These new genotyping loci enabled us to detect events of recombination between different assemblage A isolates but also between assemblage A and E isolates. In summary, we present a novel and expanded MLST strategy with significantly improved sensitivity for molecular analyses of virulence types, zoonotic potential and source tracking for assemblage A Giardia. Copyright © 2018. Published by Elsevier B.V.
Adjacent DNA sequences modulate Sox9 transcriptional activation at paired Sox sites in three chondrocyte-specific enhancer elements

PubMed Central

Bridgewater, Laura C.; Walker, Marlan D.; Miller, Gwen C.; Ellison, Trevor A.; Holsinger, L. Daniel; Potter, Jennifer L.; Jackson, Todd L.; Chen, Reuben K.; Winkel, Vicki L.; Zhang, Zhaoping; McKinney, Sandra; de Crombrugghe, Benoit

2003-01-01

Expression of the type XI collagen gene Col11a2 is directed to cartilage by at least three chondrocyte-specific enhancer elements, two in the 5′ region and one in the first intron of the gene. The three enhancers each contain two heptameric sites with homology to the Sox protein-binding consensus sequence. The two sites are separated by 3 or 4 bp and arranged in opposite orientation to each other. Targeted mutational analyses of these three enhancers showed that in the intronic enhancer, as in the other two enhancers, both Sox sites in a pair are essential for enhancer activity. The transcription factor Sox9 binds as a dimer at the paired sites, and the introduction of insertion mutations between the sites demonstrated that physical interactions between the adjacently bound proteins are essential for enhancer activity. Additional mutational analyses demonstrated that although Sox9 binding at the paired Sox sites is necessary for enhancer activity, it alone is not sufficient. Adjacent DNA sequences in each enhancer are also required, and mutation of those sequences can eliminate enhancer activity without preventing Sox9 binding. The data suggest a new model in which adjacently bound proteins affect the DNA bend angle produced by Sox9, which in turn determines whether an active transcriptional enhancer complex is assembled. PMID:12595563
DNA mutation motifs in the genes associated with inherited diseases.

PubMed

Růžička, Michal; Kulhánek, Petr; Radová, Lenka; Čechová, Andrea; Špačková, Naďa; Fajkusová, Lenka; Réblová, Kamila

2017-01-01

Mutations in human genes can be responsible for inherited genetic disorders and cancer. Mutations can arise due to environmental factors or spontaneously. It has been shown that certain DNA sequences are more prone to mutate. These sites are termed hotspots and exhibit a higher mutation frequency than expected by chance. In contrast, DNA sequences with lower mutation frequencies than expected by chance are termed coldspots. Mutation hotspots are usually derived from a mutation spectrum, which reflects particular population where an effect of a common ancestor plays a role. To detect coldspots/hotspots unaffected by population bias, we analysed the presence of germline mutations obtained from HGMD database in the 5-nucleotide segments repeatedly occurring in genes associated with common inherited disorders, in particular, the PAH, LDLR, CFTR, F8, and F9 genes. Statistically significant sequences (mutational motifs) rarely associated with mutations (coldspots) and frequently associated with mutations (hotspots) exhibited characteristic sequence patterns, e.g. coldspots contained purine tract while hotspots showed alternating purine-pyrimidine bases, often with the presence of CpG dinucleotide. Using molecular dynamics simulations and free energy calculations, we analysed the global bending properties of two selected coldspots and two hotspots with a G/T mismatch. We observed that the coldspots were inherently more flexible than the hotspots. We assume that this property might be critical for effective mismatch repair as DNA with a mutation recognized by MutSα protein is noticeably bent.
Transcriptome Analysis of the Entomopathogenic Oomycete Lagenidium giganteum Reveals Putative Virulence Factors

PubMed Central

Quiroz Velasquez, Paula F.; Abiff, Sumayyah K.; Fins, Katrina C.; Conway, Quincy B.; Salazar, Norma C.; Delgado, Ana Paula; Dawes, Jhanelle K.; Douma, Lauren G.

2014-01-01

A combination of 454 pyrosequencing and Sanger sequencing was used to sample and characterize the transcriptome of the entomopathogenic oomycete Lagenidium giganteum. More than 50,000 high-throughput reads were annotated through homology searches. Several selected reads served as seeds for the amplification and sequencing of full-length transcripts. Phylogenetic analyses inferred from full-length cellulose synthase alignments revealed that L giganteum is nested within the peronosporalean galaxy and as such appears to have evolved from a phytopathogenic ancestor. In agreement with the phylogeny reconstructions, full-length L. giganteum oomycete effector orthologs, corresponding to the cellulose-binding elicitor lectin (CBEL), crinkler (CRN), and elicitin proteins, were characterized by domain organizations similar to those of pathogenicity factors of plant-pathogenic oomycetes. Importantly, the L. giganteum effectors provide a basis for detailing the roles of canonical CRN, CBEL, and elicitin proteins in the infectious process of an oomycete known principally as an animal pathogen. Finally, phylogenetic analyses and genome mining identified members of glycoside hydrolase family 5 subfamily 27 (GH5_27) as putative virulence factors active on the host insect cuticle, based in part on the fact that GH5_27 genes are shared by entomopathogenic oomycetes and fungi but are underrepresented in nonentomopathogenic genomes. The genomic resources gathered from the L. giganteum transcriptome analysis strongly suggest that filamentous entomopathogens (oomycetes and fungi) exhibit convergent evolution: they have evolved independently from plant-associated microbes, have retained genes indicative of plant associations, and may share similar cores of virulence factors, such as GH5_27 enzymes, that are absent from the genomes of their plant-pathogenic relatives. PMID:25107973
MaGelLAn 1.0: a software to facilitate quantitative and population genetic analysis of maternal inheritance by combination of molecular and pedigree information.

PubMed

Ristov, Strahil; Brajkovic, Vladimir; Cubric-Curik, Vlatka; Michieli, Ivan; Curik, Ino

2016-09-10

Identification of genes or even nucleotides that are responsible for quantitative and adaptive trait variation is a difficult task due to the complex interdependence between a large number of genetic and environmental factors. The polymorphism of the mitogenome is one of the factors that can contribute to quantitative trait variation. However, the effects of the mitogenome have not been comprehensively studied, since large numbers of mitogenome sequences and recorded phenotypes are required to reach the adequate power of analysis. Current research in our group focuses on acquiring the necessary mitochondria sequence information and analysing its influence on the phenotype of a quantitative trait. To facilitate these tasks we have produced software for processing pedigrees that is optimised for maternal lineage analysis. We present MaGelLAn 1.0 (maternal genealogy lineage analyser), a suite of four Python scripts (modules) that is designed to facilitate the analysis of the impact of mitogenome polymorphism on quantitative trait variation by combining molecular and pedigree information. MaGelLAn 1.0 is primarily used to: (1) optimise the sampling strategy for molecular analyses; (2) identify and correct pedigree inconsistencies; and (3) identify maternal lineages and assign the corresponding mitogenome sequences to all individuals in the pedigree, this information being used as input to any of the standard software for quantitative genetic (association) analysis. In addition, MaGelLAn 1.0 allows computing the mitogenome (maternal) effective population sizes and probability of mitogenome (maternal) identity that are useful for conservation management of small populations. MaGelLAn is the first tool for pedigree analysis that focuses on quantitative genetic analyses of mitogenome data. It is conceived with the purpose to significantly reduce the effort in handling and preparing large pedigrees for processing the information linked to maternal lines. The software source code, along with the manual and the example files can be downloaded at http://lissp.irb.hr/software/magellan-1-0/ and https://github.com/sristov/magellan .
Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

PubMed

Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

2014-09-18

Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Bacterial diversity in typical Italian salami at different ripening stages as revealed by high-throughput sequencing of 16S rRNA amplicons.

PubMed

Połka, Justyna; Rebecchi, Annalisa; Pisacane, Vincenza; Morelli, Lorenzo; Puglisi, Edoardo

2015-04-01

The bacterial diversity involved in food fermentations is one of the most important factors shaping the final characteristics of traditional foods. Knowledge about this diversity can be greatly improved by the application of high-throughput sequencing technologies (HTS) coupled to the PCR amplification of the 16S rRNA subunit. Here we investigated the bacterial diversity in batches of Salame Piacentino PDO (Protected Designation of Origin), a dry fermented sausage that is typical of a regional area of Northern Italy. Salami samples from 6 different local factories were analysed at 0, 21, 49 and 63 days of ripening; raw meat at time 0 and casing samples at 21 days of ripening where also analysed, and the effect of starter addition was included in the experimental set-up. Culture-based microbiological analyses and PCR-DGGE were carried out in order to be compared with HTS results. A total of 722,196 high quality sequences were obtained after trimming, paired-reads assembly and quality screening of raw reads obtained by Illumina MiSeq sequencing of the two bacterial 16S hypervariable regions V3 and V4; manual curation of 16S database allowed a correct taxonomical classification at the species for 99.5% of these reads. Results confirmed the presence of main bacterial species involved in the fermentation of salami as assessed by PCR-DGGE, but with a greater extent of resolution and quantitative assessments that are not possible by the mere analyses of gel banding patterns. Thirty-two different Staphylococcus and 33 Lactobacillus species where identified in the salami from different producers, while the whole data set obtained accounted for 13 main families and 98 rare ones, 23 of which were present in at least 10% of the investigated samples, with casings being the major sources of the observed diversity. Multivariate analyses also showed that batches from 6 local producers tend to cluster altogether after 21 days of ripening, thus indicating that HTS has the potential for fine scale differentiation of local fermented foods. Copyright © 2014 Elsevier Ltd. All rights reserved.
Intra-Site Variability in the Still Bay Fauna at Blombos Cave: Implications for Explanatory Models of the Middle Stone Age Cultural and Technological Evolution

PubMed Central

Discamps, Emmanuel; Henshilwood, Christopher Stuart

2015-01-01

To explain cultural and technological innovations in the Middle Stone Age (MSA) of southern Africa, scholars invoke several factors. A major question in this research theme is whether MSA technocomplexes are adapted to a particular set of environmental conditions and subsistence strategies or, on the contrary, to a wide range of different foraging behaviours. While faunal studies provide key information for addressing these factors, most analyses do not assess intra-technocomplex variability of faunal exploitation (i.e. variability within MSA phases). In this study, we assess the spatial variability of the Still Bay fauna in one phase (M1) of the Blombos Cave sequence. Analyses of taxonomic composition, taphonomic alterations and combustion patterns reveal important faunal variability both across space (lateral variation in the post-depositional history of the deposits, spatial organisation of combustion features) and over time (fine-scale diachronic changes throughout a single phase). Our results show how grouping material prior to zooarchaeological interpretations (e.g. by layer or phase) can induce a loss of information. Finally, we discuss how multiple independent subdivisions of archaeological sequences can improve our understanding of both the timing of different changes (for example in technology, culture, subsistence, environment) and how they may be inter-related. PMID:26658195
Intra-Site Variability in the Still Bay Fauna at Blombos Cave: Implications for Explanatory Models of the Middle Stone Age Cultural and Technological Evolution.

PubMed

Discamps, Emmanuel; Henshilwood, Christopher Stuart

2015-01-01

To explain cultural and technological innovations in the Middle Stone Age (MSA) of southern Africa, scholars invoke several factors. A major question in this research theme is whether MSA technocomplexes are adapted to a particular set of environmental conditions and subsistence strategies or, on the contrary, to a wide range of different foraging behaviours. While faunal studies provide key information for addressing these factors, most analyses do not assess intra-technocomplex variability of faunal exploitation (i.e. variability within MSA phases). In this study, we assess the spatial variability of the Still Bay fauna in one phase (M1) of the Blombos Cave sequence. Analyses of taxonomic composition, taphonomic alterations and combustion patterns reveal important faunal variability both across space (lateral variation in the post-depositional history of the deposits, spatial organisation of combustion features) and over time (fine-scale diachronic changes throughout a single phase). Our results show how grouping material prior to zooarchaeological interpretations (e.g. by layer or phase) can induce a loss of information. Finally, we discuss how multiple independent subdivisions of archaeological sequences can improve our understanding of both the timing of different changes (for example in technology, culture, subsistence, environment) and how they may be inter-related.
The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants

PubMed Central

Reuter, Miriam S.; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K.C.; Trost, Brett; Paton, Tara A.; Pereira, Sergio L.; Herbrick, Jo-Anne; Wintle, Richard F.; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R.; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W.L.; Wang, Zhuozhi; Patel, Rohan V.; Pellecchia, Giovanna; Wei, John; Strug, Lisa J.; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M.; Bassett, Anne S.; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D.; Stavropoulos, Dimitri J.; Bowdin, Sarah; Hildebrandt, Matthew R.; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M. Stephen; Monfared, Nasim; Hosseini, S. Mohsen; Joseph-George, Ann M.; Keeley, Fred W.; Cook, Ryan A.; Fiume, Marc; Lee, Hin C.; Marshall, Christian R.; Davies, Jill; Hazell, Allison; Buchanan, Janet A.; Szego, Michael J.; Scherer, Stephen W.

2018-01-01

BACKGROUND: The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. METHODS: Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. RESULTS: Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set (n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants — associated with cancer, cardiac or neurodegenerative phenotypes — remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. INTERPRETATION: Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. PMID:29431110
The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants.

PubMed

Reuter, Miriam S; Walker, Susan; Thiruvahindrapuram, Bhooma; Whitney, Joe; Cohn, Iris; Sondheimer, Neal; Yuen, Ryan K C; Trost, Brett; Paton, Tara A; Pereira, Sergio L; Herbrick, Jo-Anne; Wintle, Richard F; Merico, Daniele; Howe, Jennifer; MacDonald, Jeffrey R; Lu, Chao; Nalpathamkalam, Thomas; Sung, Wilson W L; Wang, Zhuozhi; Patel, Rohan V; Pellecchia, Giovanna; Wei, John; Strug, Lisa J; Bell, Sherilyn; Kellam, Barbara; Mahtani, Melanie M; Bassett, Anne S; Bombard, Yvonne; Weksberg, Rosanna; Shuman, Cheryl; Cohn, Ronald D; Stavropoulos, Dimitri J; Bowdin, Sarah; Hildebrandt, Matthew R; Wei, Wei; Romm, Asli; Pasceri, Peter; Ellis, James; Ray, Peter; Meyn, M Stephen; Monfared, Nasim; Hosseini, S Mohsen; Joseph-George, Ann M; Keeley, Fred W; Cook, Ryan A; Fiume, Marc; Lee, Hin C; Marshall, Christian R; Davies, Jill; Hazell, Allison; Buchanan, Janet A; Szego, Michael J; Scherer, Stephen W

2018-02-05

The Personal Genome Project Canada is a comprehensive public data resource that integrates whole genome sequencing data and health information. We describe genomic variation identified in the initial recruitment cohort of 56 volunteers. Volunteers were screened for eligibility and provided informed consent for open data sharing. Using blood DNA, we performed whole genome sequencing and identified all possible classes of DNA variants. A genetic counsellor explained the implication of the results to each participant. Whole genome sequencing of the first 56 participants identified 207 662 805 sequence variants and 27 494 copy number variations. We analyzed a prioritized disease-associated data set ( n = 1606 variants) according to standardized guidelines, and interpreted 19 variants in 14 participants (25%) as having obvious health implications. Six of these variants (e.g., in BRCA1 or mosaic loss of an X chromosome) were pathogenic or likely pathogenic. Seven were risk factors for cancer, cardiovascular or neurobehavioural conditions. Four other variants - associated with cancer, cardiac or neurodegenerative phenotypes - remained of uncertain significance because of discrepancies among databases. We also identified a large structural chromosome aberration and a likely pathogenic mitochondrial variant. There were 172 recessive disease alleles (e.g., 5 individuals carried mutations for cystic fibrosis). Pharmacogenomics analyses revealed another 3.9 potentially relevant genotypes per individual. Our analyses identified a spectrum of genetic variants with potential health impact in 25% of participants. When also considering recessive alleles and variants with potential pharmacologic relevance, all 56 participants had medically relevant findings. Although access is mostly limited to research, whole genome sequencing can provide specific and novel information with the potential of major impact for health care. © 2018 Joule Inc. or its licensors.
What can we learn about lyssavirus genomes using 454 sequencing?

PubMed

Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin

2012-01-01

The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
STAT1:DNA sequence-dependent binding modulation by phosphorylation, protein:protein interactions and small-molecule inhibition

PubMed Central

Bonham, Andrew J.; Wenta, Nikola; Osslund, Leah M.; Prussin, Aaron J.; Vinkemeier, Uwe; Reich, Norbert O.

2013-01-01

The DNA-binding specificity and affinity of the dimeric human transcription factor (TF) STAT1, were assessed by total internal reflectance fluorescence protein-binding microarrays (TIRF-PBM) to evaluate the effects of protein phosphorylation, higher-order polymerization and small-molecule inhibition. Active, phosphorylated STAT1 showed binding preferences consistent with prior characterization, whereas unphosphorylated STAT1 showed a weak-binding preference for one-half of the GAS consensus site, consistent with recent models of STAT1 structure and function in response to phosphorylation. This altered-binding preference was further tested by use of the inhibitor LLL3, which we show to disrupt STAT1 binding in a sequence-dependent fashion. To determine if this sequence-dependence is specific to STAT1 and not a general feature of human TF biology, the TF Myc/Max was analysed and tested with the inhibitor Mycro3. Myc/Max inhibition by Mycro3 is sequence independent, suggesting that the sequence-dependent inhibition of STAT1 may be specific to this system and a useful target for future inhibitor design. PMID:23180800
Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

DOE PAGES

Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.; ...

2018-01-09

The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less
Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.

The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less
The BaMM web server for de-novo motif discovery and regulatory sequence analysis.

PubMed

Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes

2018-05-28

The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.

Transfer of radionuclides to plants of natural ecosystems at the Semipalatinsk Test Site.

PubMed

Larionova, N V; Lukashenko, S N; Kabdyrakova, A M; Kunduzbayeva, A Ye; Panitskiy, A V; Ivanova, A R

2018-06-01

A systematic study devoted to 137 Cs, 90 Sr, 241 Am, 239+240 Pu radionuclides in vegetation cover from several spots of the Semipalatinsk test site (STS) is summarised in this paper, highlighting the main findings obtained. The analysed spots are characterized by various types of radioactive contamination. Transfer factors (Tf) required for the quantitative description of the radionuclides transition from the soil to aboveground plant parts were determined, being found that, on average, the minimum Tf for all the radionuclides concerned were determined on the "Experimental Field" ground, followed by the determined ones in the "plumes" of radioactive fallout and in the conditionally "background" territories analysed. The highest transfer factors were characteristic of zones of radioactive streamflows and places of warfare radioactive agent (WRA) tests. On the other hand, ordering the radionuclide transferring factors in descending order, the following sequence was obtained: 90 Sr Tf > Cs Tf > 239+240 Pu Tf > 241 Am Tf, with the 90 Sr Tf, on the average, exceeding the 137 Cs Tf by 8 times and exceeding the 239+240 Pu Tf by up 16 times. 239+240 Pu Tf values were up to 3 times higher than the 241 Am Tf. The exception to the indicated radionuclide Tf descending order corresponded to places of WRA tests where Tf of radionuclides of interest by plants follows the sequence 90 Sr > 239+240 Pu > 137 Cs. Copyright © 2017 Elsevier Ltd. All rights reserved.
ChIP-seq and RNA-seq methods to study circadian control of transcription in mammals

PubMed Central

Takahashi, Joseph S.; Kumar, Vivek; Nakashe, Prachi; Koike, Nobuya; Huang, Hung-Chung; Green, Carla B.; Kim, Tae-Kyung

2015-01-01

Genome-wide analyses have revolutionized our ability to study the transcriptional regulation of circadian rhythms. The advent of next-generation sequencing methods has facilitated the use of two such technologies, ChIP-seq and RNA-seq. In this chapter, we describe detailed methods and protocols for these two techniques, with emphasis on their usage in circadian rhythm experiments in the mouse liver, a major target organ of the circadian clock system. Critical factors for these methods are highlighted and issues arising with time series samples for ChIP-seq and RNA-seq are discussed. Finally detailed protocols for library preparation suitable for Illumina sequencing platforms are presented. PMID:25662462
A review of bioinformatic methods for forensic DNA analyses.

PubMed

Liu, Yao-Yuan; Harbison, SallyAnn

2018-03-01

Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.
Obtaining a more resolute teleost growth hormone phylogeny by the introduction of gaps in sequence alignment.

PubMed

Rubin, D A; Dores, R M

1995-06-01

In order to obtain a more resolute phylogeny of teleosts based on growth hormone (GH) sequences, phylogenetic analyses were performed in which deletions (gaps), which appear to be order specific, were upheld to maintain GH's structural information. Sequences were analyzed at 194 amino acid positions. In addition, the two closest genealogically related groups to the teleosts, Amia calva and Acipenser guldenstadti, were used as outgroups. Modified sequence alignments were also analyzed to determine clade stability. Analyses indicated, in the most parsimonious cladogram, that molecular and morphological relationships for the orders of fishes are congruent. With GH molecular sequence data it was possible to resolve all clades at the familial level. Analyses of the primary sequence data indicate that: (a) the halecomorphean and chondrostean GH sequences are the appropriate outgroups for generating the most parsimonious cladogram for teleosts; (b) proper alignment of teleost GH sequence by the inclusion of gaps is necessary for resolution of the Percomorpha; and (c) removal of sequence information by deleting improperly aligned sequence decreases the phylogenetic signal obtained.
Short branches lead to systematic artifacts when BLAST searches are used as surrogate for phylogenetic reconstruction.

PubMed

Dick, Amanda A; Harlow, Timothy J; Gogarten, J Peter

2017-02-01

Long Branch Attraction (LBA) is a well-known artifact in phylogenetic reconstruction when dealing with branch length heterogeneity. Here we show another phenomenon, Short Branch Attraction (SBA), which occurs when BLAST searches, a phenetic analysis, are used as a surrogate method for phylogenetic analysis. This error also results from branch length heterogeneity, but this time it is the short branches that are attracting. The SBA artifact is reciprocal and can be returned 100% of the time when multiple branches differ in length by a factor of more than two. SBA is an intended feature of BLAST searches, but becomes an issue, when top scoring BLAST hit analyses are used to infer Horizontal Gene Transfers (HGTs), assign taxonomic category with environmental sequence data in phylotyping, or gather homologous sequences for building gene families. SBA can lead researchers to believe that there has been a HGT event when only vertical descent has occurred, cause slowly evolving taxa to be over-represented and quickly evolving taxa to be under-represented in phylotyping, or systematically exclude quickly evolving taxa from analyses. SBA also contributes to the changing results of top scoring BLAST hit analyses as the database grows, because more slowly evolving taxa, or short branches, are added over time, introducing more potential for SBA. SBA can be detected by examining reciprocal best BLAST hits among a larger group of taxa, including the known closest phylogenetic neighbors. Therefore, one should look for this phenomenon when conducting best BLAST hit analyses as a surrogate method to identify HGTs, in phylotyping, or when using BLAST to gather homologous sequences. Copyright © 2016 Elsevier Inc. All rights reserved.
Identification of food and beverage spoilage yeasts from DNA sequence analyses

USDA-ARS?s Scientific Manuscript database

Detection, identification, and classification of yeasts has undergone a major transformation in the last decade and a half following application of gene sequence analyses and genome comparisons. Development of a database (barcode) of easily determined DNA sequences from domains 1 and 2 (D1/D2) of th...
Enabling large-scale next-generation sequence assembly with Blacklight

PubMed Central

Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

2014-01-01

Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
Low-pass sequencing for microbial comparative genomics

PubMed Central

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

PubMed

Ming, De-Song; Chen, Qing-Qing; Chen, Xiao-Tin

2018-05-14

To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of β-lactamase [NPS β-lactamase (EC 3.5.2.6), β-lactamase class C, and a metal-dependent hydrolase of β-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.
Identification and expression analysis of cDNA encoding insulin-like growth factor 2 in horses

PubMed Central

KIKUCHI, Kohta; SASAKI, Keisuke; AKIZAWA, Hiroki; TSUKAHARA, Hayato; BAI, Hanako; TAKAHASHI, Masashi; NAMBO, Yasuo; HATA, Hiroshi; KAWAHARA, Manabu

2017-01-01

Insulin-like growth factor 2 (IGF2) is responsible for a broad range of physiological processes during fetal development and adulthood, but genomic analyses of IGF2 containing the 5ʹ- and 3ʹ-untranslated regions (UTRs) in equines have been limited. In this study, we characterized the IGF2 mRNA containing the UTRs, and determined its expression pattern in the fetal tissues of horses. The complete equine IGF2 mRNA sequence harboring another exon approximately 2.8 kb upstream from the canonical transcription start site was identified as a new transcript variant. As this upstream exon did not contain the start codon, the amino acid sequence was identical to the canonical variant. Analysis of the deduced amino acid sequence revealed that the protein possessed two major domains, IlGF and IGF2_C, and analysis of IGF2 sequence polymorphism in fetal tissues of Hokkaido native horse and Thoroughbreds revealed a single nucleotide polymorphism (T to C transition) at position 398 in Thoroughbreds, which caused an amino acid substitution at position 133 in the IGF2 sequence. Furthermore, the expression pattern of the IGF2 mRNA in the fetal tissues of horses was determined for the first time, and was found to be consistent with those of other species. Taken together, these results suggested that the transcriptional and translational products of the IGF2 gene have conserved functions in the fetal development of mammals, including horses. PMID:29151450
Molecular cloning and characterization of beluga whale (Delphinapterus leucas) interleukin-1beta and tumor necrosis factor-alpha.

PubMed Central

Denis, F; Archambault, D

2001-01-01

Interleukin-1beta (IL-1beta) and tumor necrosis factor-alpha (TNF-alpha) are cytokines produced primarily by monocytes and macrophages with regulatory effects in inflammation and multiple aspects of the immune response. As yet, no molecular data have been reported for IL-1beta and TNF-alpha of the beluga whale. In this study, we cloned and determined the entire cDNA sequence encoding beluga whale IL-1beta and TNF-alpha. The genetic relationship of the cytokine sequences was then analyzed with those from several mammalian species, including the human and the pig. The homology of beluga whale IL-1beta nucleic acid and deduced amino acid sequences with those from these mammalian species ranged from 74.6 to 86.0% and 62.7 to 77.1%, respectively, whereas that of TNF-alpha varied from 79.3 to 90.8% and 75.3 to 87.7%, respectively. Phylogenetic analyses based on deduced amino acid sequences showed that the beluga whale IL-1beta and TNF-alpha were most closely related to those of the ruminant species (cattle, sheep, and deer). The beluga whale IL-1beta- and TNF-alpha-encoding sequences were thereafter successfully expressed in Escherichia coli as fusion proteins by using procaryotic expression vectors. The fusion proteins were used to produce beluga whale IL-1beta- and TNF-alpha-specific rabbit antisera. Images Figure 3. Figure 4. Figure 5. PMID:11768130
A few nucleotide polymorphisms are sufficient to recruit nuclear factors differentially to the intron 1 of HPV-16 intratypic variants.

PubMed

López-Urrutia, Eduardo; Valdés, Jesús; Bonilla-Moreno, Raúl; Martínez-Salazar, Martha; Martínez-Garcia, Martha; Berumen, Jaime; Villegas-Sepúlveda, Nicolás

2012-06-01

The HPV-16 E6/E7 genes, which contain intron 1, are processed by alternative splicing and its transcripts are detected with a heterogeneous profile in tumours cells. Frequently, the HPV-16 positive carcinoma cells bear viral variants that contain single nucleotide polymorphisms into its DNA sequence. We were interested in analysing the contribution of this polymorphism to the heterogeneity in the pattern of the E6/E7 spliced transcripts. Using the E6/E7 sequences from three closely related HPV-16 variants, we have shown that a few nucleotide changes are sufficient to produce heterogeneity in the splicing profile. Furthermore, using mutants that contained a single SNP, we also showed that one nucleotide change was sufficient to reproduce the heterogeneous splicing profile. Additionally, a difference of two or three SNPs among these viral sequences was sufficient to recruit differentially several splicing factors to the polymorphic E6/E7 transcripts. Moreover, only one SNP was sufficient to alter the binding site of at least one splicing factor, changing the ability of splicing factors to bind the transcript. Finally, the factors that were differentially bound to the short form of intron 1 of one of these E6/E7 variants were identified as TIA1 and/or TIAR and U1-70k, while U2AF65, U5-52k and PTB were preferentially bound to the transcript of the other variants. Copyright © 2012 Elsevier B.V. All rights reserved.
Pathogenesis of Helicobacter pylori-Related Gastroduodenal Diseases from Molecular Epidemiological Studies.

PubMed

Yamaoka, Yoshio

2012-01-01

Helicobacter pylori is a major human pathogen that infects the stomach and produces inflammation that is responsible for various gastroduodenal diseases. Despite the high prevalence of H. pylori infections in Africa and South Asia, the incidence of gastric cancer in these areas is much lower than in other countries. The incidence of gastric cancer also tends to decrease from north to south in East Asia. Data from molecular epidemiological studies show that this variation in different geographic areas could be explained in part by different types of H. pylori virulence factors, especially CagA, VacA, and OipA. H. pylori infection is thought to be involved in both gastric cancer and duodenal ulcer, which are at opposite ends of the disease spectrum. This discrepancy can also be explained in part by another H. pylori factor, DupA, as well as by CagA typing (East Asian type versus Western type). H. pylori has a genome of approximately 1,600 genes; therefore, there might be other novel virulence factors. Because genome wide analyses using whole-genome sequencing technology give a broad view of the genome of H. pylori, we hope that next-generation sequencers will enable us to efficiently investigate novel virulence factors.
Repeated Evolution of the Pyrrolizidine Alkaloid–Mediated Defense System in Separate Angiosperm LineagesW⃞

PubMed Central

Reimann, Andreas; Nurhayati, Niknik; Backenköhler, Anita; Ober, Dietrich

2004-01-01

Species of several unrelated families within the angiosperms are able to constitutively produce pyrrolizidine alkaloids as a defense against herbivores. In pyrrolizidine alkaloid (PA) biosynthesis, homospermidine synthase (HSS) catalyzes the first specific step. HSS was recruited during angiosperm evolution from deoxyhypusine synthase (DHS), an enzyme involved in the posttranslational activation of eukaryotic initiation factor 5A. Phylogenetic analysis of 23 cDNA sequences coding for HSS and DHS of various angiosperm species revealed at least four independent recruitments of HSS from DHS: one within the Boraginaceae, one within the monocots, and two within the Asteraceae family. Furthermore, sequence analyses indicated elevated substitution rates within HSS-coding sequences after each gene duplication, with an increased level of nonsynonymous mutations. However, the contradiction between the polyphyletic origin of the first enzyme in PA biosynthesis and the structural identity of the final biosynthetic PA products needs clarification. PMID:15466410
RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

PubMed

Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo

2012-01-01

In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

PubMed

Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

2017-10-06

Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Analyzing endocrine system conservation and evolution.

PubMed

Bonett, Ronald M

2016-08-01

Analyzing variation in rates of evolution can provide important insights into the factors that constrain trait evolution, as well as those that promote diversification. Metazoan endocrine systems exhibit apparent variation in evolutionary rates of their constituent components at multiple levels, yet relatively few studies have quantified these patterns and analyzed them in a phylogenetic context. This may be in part due to historical and current data limitations for many endocrine components and taxonomic groups. However, recent technological advancements such as high-throughput sequencing provide the opportunity to collect large-scale comparative data sets for even non-model species. Such ventures will produce a fertile data landscape for evolutionary analyses of nucleic acid and amino acid based endocrine components. Here I summarize evolutionary rate analyses that can be applied to categorical and continuous endocrine traits, and also those for nucleic acid and protein-based components. I emphasize analyses that could be used to test whether other variables (e.g., ecology, ontogenetic timing of expression, etc.) are related to patterns of rate variation and endocrine component diversification. The application of phylogenetic-based rate analyses to comparative endocrine data will greatly enhance our understanding of the factors that have shaped endocrine system evolution. Copyright © 2016 Elsevier Inc. All rights reserved.
Quantifying Transmission.

PubMed

Woolhouse, Mark

2017-07-01

Transmissibility is the defining characteristic of infectious diseases. Quantifying transmission matters for understanding infectious disease epidemiology and designing evidence-based disease control programs. Tracing individual transmission events can be achieved by epidemiological investigation coupled with pathogen typing or genome sequencing. Individual infectiousness can be estimated by measuring pathogen loads, but few studies have directly estimated the ability of infected hosts to transmit to uninfected hosts. Individuals' opportunities to transmit infection are dependent on behavioral and other risk factors relevant given the transmission route of the pathogen concerned. Transmission at the population level can be quantified through knowledge of risk factors in the population or phylogeographic analysis of pathogen sequence data. Mathematical model-based approaches require estimation of the per capita transmission rate and basic reproduction number, obtained by fitting models to case data and/or analysis of pathogen sequence data. Heterogeneities in infectiousness, contact behavior, and susceptibility can have substantial effects on the epidemiology of an infectious disease, so estimates of only mean values may be insufficient. For some pathogens, super-shedders (infected individuals who are highly infectious) and super-spreaders (individuals with more opportunities to transmit infection) may be important. Future work on quantifying transmission should involve integrated analyses of multiple data sources.
Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea.

PubMed

Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

2017-07-05

Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.

The pineapple AcMADS1 promoter confers high level expression in tomato and arabidopsis flowering and fruiting tissues, but AcMADS1 does not complement the tomato LeMADS-RIN (rin) mutant

USDA-ARS?s Scientific Manuscript database

A previous EST study identified a MADS box transcription factor coding sequence, AcMADS1, that is strongly induced during non-climacteric pineapple fruit ripening. Phylogenetic analyses place the AcMADS1 protein in the same superclade as LeMADS-RIN, a master regulator of fruit ripening upstream of e...
Genetic diversity of Babesia bovis in virulent and attenuated strains.

PubMed

Mazuz, M L; Molad, T; Fish, L; Leibovitz, B; Wolkomirsky, R; Fleiderovitz, L; Shkap, V

2012-03-01

The aim of this study was to compare the genetic diversity of the single copy Bv80 gene sequences of Babesia bovis in populations of attenuated and virulent parasites. PCR/ RT-PCR followed by cloning and sequence analyses of 4 attenuated and 4 virulent strains were performed. Multiple fragments in the range of 420 to 744 bp were amplified by PCR or RT-PCR. Cloning of the PCR fragments and sequence analyses revealed the presence of mixed subpopulations in either virulent or attenuated parasites with a total of 19 variants with 12 different sequences that differed in number and type of tandem repeats. High levels of intra- and inter-strain diversity of the Bv80 gene, with the presence of mixed populations of parasites were found in both the virulent field isolates and the attenuated vaccine strains. In addition, during the attenuation process, sequence analyses showed changes in the pattern of the parasite subpopulations. Despite high polymorphism found by sequence analyses, the patterns observed and the number of repeats, order, or motifs found could not discriminate between virulent field isolates and attenuated vaccine strains of the parasite.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.

PubMed

Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie

2011-09-12

Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

PubMed Central

2011-01-01

Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

PubMed Central

2014-01-01

Background Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data. Results We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade (Monilophyta), Equisetales + Psilotales are sister to Marattiales + leptosporangiate ferns. Our analyses also highlight the challenges of using plastid genome sequences in deep-level phylogenomic analyses, and we provide suggestions for future analyses that will likely incorporate plastid genome sequence data for thousands of species. We particularly emphasize the importance of exploring the effects of different partitioning and character coding strategies. PMID:24533922
Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing.

PubMed

Crampton, Mollee; Sripathi, Venkateswara R; Hossain, Khwaja; Kalavacharla, Venu

2016-01-01

Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar ("Sierra") using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation.
ACGT-containing abscisic acid response element (ABRE) and coupling element 3 (CE3) are functionally equivalent.

PubMed

Hobo, T; Asada, M; Kowyama, Y; Hattori, T

1999-09-01

ACGT-containing ABA response elements (ABREs) have been functionally identified in the promoters of various genes. In addition, single copies of ABRE have been found to require a cis-acting, coupling element to achieve ABA induction. A coupling element 3 (CE3) sequence, originally identified as such in the barley HVA1 promoter, is found approximately 30 bp downstream of motif A (ACGT-containing ABRE) in the promoter of the Osem gene. The relationship between these two elements was further defined by linker-scan analyses of a 55 bp fragment of the Osem promoter, which is sufficient for ABA-responsiveness and VP1 activation. The analyses revealed that both motif A and CE3 sequence were required not only for ABA-responsiveness but also for VP1 activation. Since the sequences of motif A and CE3 were found to be similar, motif-exchange experiments were carried out. The experiments demonstrated that motif A and CE3 were interchangeable by each other with respect to both ABA and VP1 regulation. In addition, both sequences were shown to be recognized by a VP1-interacting, ABA-responsive bZIP factor TRAB1. These results indicate that ACGT-containing ABREs and CE3 are functionally equivalent cis-acting elements. Furthermore, TRAB1 was shown to bind two other non-ACGT ABREs. Based on these results, all these ABREs including CE3 are proposed to be categorized into a single class of cis-acting elements.
Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing

PubMed Central

Crampton, Mollee; Sripathi, Venkateswara R.; Hossain, Khwaja; Kalavacharla, Venu

2016-01-01

Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar (“Sierra”) using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation. PMID:27199997
Pooled-DNA Sequencing for Elucidating New Genomic Risk Factors, Rare Variants Underlying Alzheimer's Disease.

PubMed

Jin, Sheng Chih; Benitez, Bruno A; Deming, Yuetiva; Cruchaga, Carlos

2016-01-01

Analyses of genome-wide association studies (GWAS) for complex disorders usually identify common variants with a relatively small effect size that only explain a small proportion of phenotypic heritability. Several studies have suggested that a significant fraction of heritability may be explained by low-frequency (minor allele frequency (MAF) of 1-5 %) and rare-variants that are not contained in the commercial GWAS genotyping arrays (Schork et al., Curr Opin Genet Dev 19:212, 2009). Rare variants can also have relatively large effects on risk for developing human diseases or disease phenotype (Cruchaga et al., PLoS One 7:e31039, 2012). However, it is necessary to perform next-generation sequencing (NGS) studies in a large population (>4,000 samples) to detect a significant rare-variant association. Several NGS methods, such as custom capture sequencing and amplicon-based sequencing, are designed to screen a small proportion of the genome, but most of these methods are limited in the number of samples that can be multiplexed (i.e. most sequencing kits only provide 96 distinct index). Additionally, the sequencing library preparation for 4,000 samples remains expensive and thus conducting NGS studies with the aforementioned methods are not feasible for most research laboratories.The need for low-cost large scale rare-variant detection makes pooled-DNA sequencing an ideally efficient and cost-effective technique to identify rare variants in target regions by sequencing hundreds to thousands of samples. Our recent work has demonstrated that pooled-DNA sequencing can accurately detect rare variants in targeted regions in multiple DNA samples with high sensitivity and specificity (Jin et al., Alzheimers Res Ther 4:34, 2012). In these studies we used a well-established pooled-DNA sequencing approach and a computational package, SPLINTER (short indel prediction by large deviation inference and nonlinear true frequency estimation by recursion) (Vallania et al., Genome Res 20:1711, 2010), for accurate identification of rare variants in large DNA pools. Given an average sequencing coverage of 30× per haploid genome, SPLINTER can detect rare variants and short indels up to 4 base pairs (bp) with high sensitivity and specificity (up to 1 haploid allele in a pool as large as 500 individuals). Step-by-step instructions on how to conduct pooled-DNA sequencing experiments and data analyses are described in this chapter.
The nonlinear, complex sequential organization of behavior in schizophrenic patients: neurocognitive strategies and clinical correlations.

PubMed

Paulus, M P; Perry, W; Braff, D L

1999-09-01

Thought disorder is a hallmark of schizophrenia and can be inferred from disorganized behavior. Measures of the sequential organization of behavior are important because they reflect the cognitive processes of the selection and sequencing of behavioral elements, which generate observable and analyzable behavioral patterns. In this context, sequences of choices generated by schizophrenic patients in a two-choice guessing task fluctuate significantly, which reflects an "oscillating dysregulation" between highly predictable and highly unpredictable subsequences within a single test session. In this study, we aimed to clarify the significance of dysregulation by seeing whether demographic, clinical, neuropsychological, and psychological measures predict the degree of dysregulation observed on this two-choice task. Thirty schizophrenic patients repeatedly performed a LEFT or RIGHT key press that was followed by a stimulus, which occurred randomly on the left or right side of the computer screen. Thus, the stimulus location had nothing to do with the key press behavior. The range of key press sequence predictabilities as measured by the dynamical entropy was used to quantify the dysregulation of response sequences and reflects the range of fixity and randomness of the responses. A factor analysis was performed and step-wise multiple regression analyses were used to relate the factor scores to demographic, clinical, symptomatic, Wisconsin Card Sorting Test (WCST), and Rorschach variables. The LEFT/RIGHT key press sequences were determined by three factors: 1) the degree of win-stay/lose-shift strategy; 2) the degree of contextual influence on the current choice; and 3) the degree of dysregulation on the choice task. Demographic and clinical variables did not predict any of the three response patterns on the choice task. In contrast, the WCST and Rorschach test predicted performance on various factors of choice task response patterns. Schizophrenic patients employ several rules, i.e., "win-stay/lose-shift" and "decide according to the previous choice," that fluctuate significantly when generating sequences on this task, confirming that a basic behavioral dysregulation occurs in a single schizophrenic subject across a single test session. The organization or the "temporal architecture" of the behavioral sequences is not related to symptoms per se, but is related to deficits in executive functioning, problem solving, and perceptual organizational abilities.
Three RNA recognition motifs participate in RNA recognition and structural organization by the pro-apoptotic factor TIA-1

PubMed Central

Bauer, William J.; Heath, Jason; Jenkins, Jermaine L.; Kielkopf, Clara L.

2012-01-01

T-cell intracellular antigen-1 (TIA-1) regulates developmental and stress-responsive pathways through distinct activities at the levels of alternative pre-mRNA splicing and mRNA translation. The TIA-1 polypeptide contains three RNA recognition motifs (RRMs). The central RRM2 and C-terminal RRM3 associate with cellular mRNAs. The N-terminal RRM1 enhances interactions of a C-terminal Q-rich domain of TIA-1 with the U1-C splicing factor, despite linear separation of the domains in the TIA-1 sequence. Given the expanded functional repertoire of the RRM family, it was unknown whether TIA-1 RRM1 contributes to RNA binding as well as documented protein interactions. To address this question, we used isothermal titration calorimetry and small-angle X-ray scattering (SAXS) to dissect the roles of the TIA-1 RRMs in RNA recognition. Notably, the fas RNA exhibited two binding sites with indistinguishable affinities for TIA-1. Analyses of TIA-1 variants established that RRM1 was dispensable for binding AU-rich fas sites, yet all three RRMs were required to bind a polyU RNA with high affinity. SAXS analyses demonstrated a `V' shape for a TIA-1 construct comprising the three RRMs, and revealed that its dimensions became more compact in the RNA-bound state. The sequence-selective involvement of TIA-1 RRM1 in RNA recognition suggests a possible role for RNA sequences in regulating the distinct functions of TIA-1. Further implications for U1-C recruitment by the adjacent TIA-1 binding sites of the fas pre-mRNA and the bent TIA-1 shape, which organizes the N- and C-termini on the same side of the protein, are discussed. PMID:22154808
DMINDA: an integrated web server for DNA motif identification and analyses

PubMed Central

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-01-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. PMID:24753419
Evolution of epigenetic regulation in vertebrate genomes

PubMed Central

Lowdon, Rebecca F.; Jang, Hyo Sik; Wang, Ting

2016-01-01

Empirical models of sequence evolution have spurred progress in the field of evolutionary genetics for decades. We are now realizing the importance and complexity of the eukaryotic epigenome. While epigenome analysis has been applied to genomes from single cell eukaryotes to human, comparative analyses are still relatively few, and computational algorithms to quantify epigenome evolution remain scarce. Accordingly, a quantitative model of epigenome evolution remains to be established. Here we review the comparative epigenomics literature and synthesize its overarching themes. We also suggest one mechanism, transcription factor binding site turnover, which relates sequence evolution to epigenetic conservation or divergence. Lastly, we propose a framework for how the field can move forward to build a coherent quantitative model of epigenome evolution. PMID:27080453
Differential pleiotropy and HOX functional organization.

PubMed

Sivanantharajah, Lovesha; Percival-Smith, Anthony

2015-02-01

Key studies led to the idea that transcription factors are composed of defined modular protein motifs or domains, each with separable, unique function. During evolution, the recombination of these modular domains could give rise to transcription factors with new properties, as has been shown using recombinant molecules. This archetypic, modular view of transcription factor organization is based on the analyses of a few transcription factors such as GAL4, which may represent extreme exemplars rather than an archetype or the norm. Recent work with a set of Homeotic selector (HOX) proteins has revealed differential pleiotropy: the observation that highly-conserved HOX protein motifs and domains make small, additive, tissue specific contributions to HOX activity. Many of these differentially pleiotropic HOX motifs may represent plastic sequence elements called short linear motifs (SLiMs). The coupling of differential pleiotropy with SLiMs, suggests that protein sequence changes in HOX transcription factors may have had a greater impact on morphological diversity during evolution than previously believed. Furthermore, differential pleiotropy may be the genetic consequence of an ensemble nature of HOX transcription factor allostery, where HOX proteins exist as an ensemble of states with the capacity to integrate an extensive array of developmental information. Given a new structural model for HOX functional domain organization, the properties of the archetypic TF may require reassessment. Copyright © 2014 Elsevier Inc. All rights reserved.
Sequence Divergence and Conservation in Genomes of Helicobacter cetorum Strains from a Dolphin and a Whale

PubMed Central

Kersulyte, Dangeruta; Rossi, Mirko; Berg, Douglas E.

2013-01-01

Background and Objectives Strains of Helicobacter cetorum have been cultured from several marine mammals and have been found to be closely related in 16 S rDNA sequence to the human gastric pathogen H. pylori, but their genomes were not characterized further. Methods The genomes of H. cetorum strains from a dolphin and a whale were sequenced completely using 454 technology and PCR and capillary sequencing. Results These genomes are 1.8 and 1.95 mb in size, some 7–26% larger than H. pylori genomes, and differ markedly from one another in gene content, and sequences and arrangements of shared genes. However, each strain is more related overall to H. pylori and its descendant H. acinonychis than to other known species. These H. cetorum strains lack cag pathogenicity islands, but contain novel alleles of the virulence-associated vacuolating cytotoxin (vacA) gene. Of particular note are (i) an extra triplet of vacA genes with ≤50% protein-level identity to each other in the 5′ two-thirds of the gene needed for host factor interaction; (ii) divergent sets of outer membrane protein genes; (iii) several metabolic genes distinct from those of H. pylori; (iv) genes for an iron-cofactored urease related to those of Helicobacter species from terrestrial carnivores, in addition to genes for a nickel co-factored urease; and (v) members of the slr multigene family, some of which modulate host responses to infection and improve Helicobacter growth with mammalian cells. Conclusions Our genome sequence data provide a glimpse into the novelty and great genetic diversity of marine helicobacters. These data should aid further analyses of microbial genome diversity and evolution and infection and disease mechanisms in vast and often fragile ocean ecosystems. PMID:24358262
Analyses of Hypomethylated Oil Palm Gene Space

PubMed Central

Jayanthi, Nagappan; Mohd-Amin, Ab Halim; Azizi, Norazah; Chan, Kuang-Lim; Maqbool, Nauman J.; Maclean, Paul; Brauning, Rudi; McCulloch, Alan; Moraga, Roger; Ong-Abdullah, Meilina; Singh, Rajinder

2014-01-01

Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm. PMID:24497974
Interactions between the R2R3-MYB Transcription Factor, AtMYB61, and Target DNA Binding Sites

PubMed Central

Prouse, Michael B.; Campbell, Malcolm M.

2013-01-01

Despite the prominent roles played by R2R3-MYB transcription factors in the regulation of plant gene expression, little is known about the details of how these proteins interact with their DNA targets. For example, while Arabidopsis thaliana R2R3-MYB protein AtMYB61 is known to alter transcript abundance of a specific set of target genes, little is known about the specific DNA sequences to which AtMYB61 binds. To address this gap in knowledge, DNA sequences bound by AtMYB61 were identified using cyclic amplification and selection of targets (CASTing). The DNA targets identified using this approach corresponded to AC elements, sequences enriched in adenosine and cytosine nucleotides. The preferred target sequence that bound with the greatest affinity to AtMYB61 recombinant protein was ACCTAC, the AC-I element. Mutational analyses based on the AC-I element showed that ACC nucleotides in the AC-I element served as the core recognition motif, critical for AtMYB61 binding. Molecular modelling predicted interactions between AtMYB61 amino acid residues and corresponding nucleotides in the DNA targets. The affinity between AtMYB61 and specific target DNA sequences did not correlate with AtMYB61-driven transcriptional activation with each of the target sequences. CASTing-selected motifs were found in the regulatory regions of genes previously shown to be regulated by AtMYB61. Taken together, these findings are consistent with the hypothesis that AtMYB61 regulates transcription from specific cis-acting AC elements in vivo. The results shed light on the specifics of DNA binding by an important family of plant-specific transcriptional regulators. PMID:23741471
Identification and expression analysis of ERF transcription factor genes in petunia during flower senescence and in response to hormone treatments.

PubMed

Liu, Juanxu; Li, Jingyu; Wang, Huinan; Fu, Zhaodi; Liu, Juan; Yu, Yixun

2011-01-01

Ethylene-responsive element-binding factor (ERF) genes constitute one of the largest transcription factor gene families in plants. In Arabidopsis and rice, only a few ERF genes have been characterized so far. Flower senescence is associated with increased ethylene production in many flowers. However, the characterization of ERF genes in flower senescence has not been reported. In this study, 13 ERF cDNAs were cloned from petunia. Based on the sequence characterization, these PhERFs could be classified into four of the 12 known ERF families. Their predicted amino acid sequences exhibited similarities to ERFs from other plant species. Expression analyses of PhERF mRNAs were performed in corollas and gynoecia of petunia flower. The 13 PhERF genes displayed differential expression patterns and levels during natural flower senescence. Exogenous ethylene accelerates the transcription of the various PhERF genes, and silver thiosulphate (STS) decreased the transcription of several PhERF genes in corollas and gynoecia. PhERF genes of group VII showed a strong association with the rise in ethylene production in both petals and gynoecia, and might be associated particularly with flower senescence in petunia. The effect of sugar, methyl jasmonate, and the plant hormones abscisic acid, salicylic acid, and 6-benzyladenine in regulating the different PhERF transcripts was investigated. Functional nuclear localization signal analyses of two PhERF proteins (PhERF2 and PhERF3) were carried out using fluorescence microscopy. These results supported a role for petunia PhERF genes in transcriptional regulation of petunia flower senescence processes.
Population Genomics of Paramecium Species.

PubMed

Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael

2017-05-01

Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Global ecological pattern of ammonia-oxidizing archaea.

PubMed

Cao, Huiluo; Auguet, Jean-Christophe; Gu, Ji-Dong

2013-01-01

The global distribution of ammonia-oxidizing archaea (AOA), which play a pivotal role in the nitrification process, has been confirmed through numerous ecological studies. Though newly available amoA (ammonia monooxygenase subunit A) gene sequences from new environments are accumulating rapidly in public repositories, a lack of information on the ecological and evolutionary factors shaping community assembly of AOA on the global scale is apparent. We conducted a meta-analysis on uncultured AOA using over ca. 6,200 archaeal amoA gene sequences, so as to reveal their community distribution patterns along a wide spectrum of physicochemical conditions and habitat types. The sequences were dereplicated at 95% identity level resulting in a dataset containing 1,476 archaeal amoA gene sequences from eight habitat types: namely soil, freshwater, freshwater sediment, estuarine sediment, marine water, marine sediment, geothermal system, and symbiosis. The updated comprehensive amoA phylogeny was composed of three major monophyletic clusters (i.e. Nitrosopumilus, Nitrosotalea, Nitrosocaldus) and a non-monophyletic cluster constituted mostly by soil and sediment sequences that we named Nitrososphaera. Diversity measurements indicated that marine and estuarine sediments as well as symbionts might be the largest reservoirs of AOA diversity. Phylogenetic analyses were further carried out using macroevolutionary analyses to explore the diversification pattern and rates of nitrifying archaea. In contrast to other habitats that displayed constant diversification rates, marine planktonic AOA interestingly exhibit a very recent and accelerating diversification rate congruent with the lowest phylogenetic diversity observed in their habitats. This result suggested the existence of AOA communities with different evolutionary history in the different habitats. Based on an up-to-date amoA phylogeny, this analysis provided insights into the possible evolutionary mechanisms and environmental parameters that shape AOA community assembly at global scale.

Comparative and evolutionary studies of vertebrate ALDH1A-like genes and proteins.

PubMed

Holmes, Roger S

2015-06-05

Vertebrate ALDH1A-like genes encode cytosolic enzymes capable of metabolizing all-trans-retinaldehyde to retinoic acid which is a molecular 'signal' guiding vertebrate development and adipogenesis. Bioinformatic analyses of vertebrate and invertebrate genomes were undertaken using known ALDH1A1, ALDH1A2 and ALDH1A3 amino acid sequences. Comparative analyses of the corresponding human genes provided evidence for distinct modes of gene regulation and expression with putative transcription factor binding sites (TFBS), CpG islands and micro-RNA binding sites identified for the human genes. ALDH1A-like sequences were identified for all mammalian, bird, lizard and frog genomes examined, whereas fish genomes displayed a more restricted distribution pattern for ALDH1A1 and ALDH1A3 genes. The ALDH1A1 gene was absent in many bony fish genomes examined, with the ALDH1A3 gene also absent in the medaka and tilapia genomes. Multiple ALDH1A1-like genes were identified in mouse, rat and marsupial genomes. Vertebrate ALDH1A1, ALDH1A2 and ALDH1A3 subunit sequences were highly conserved throughout vertebrate evolution. Comparative amino acid substitution rates showed that mammalian ALDH1A2 sequences were more highly conserved than for the ALDH1A1 and ALDH1A3 sequences. Phylogenetic studies supported an hypothesis for ALDH1A2 as a likely primordial gene originating in invertebrate genomes and undergoing sequential gene duplication to generate two additional genes, ALDH1A1 and ALDH1A3, in most vertebrate genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Sequences of Normative Evaluation in Two Telecollaboration Projects: A Comparative Study of Multimodal Feedback through Desktop Videoconference

ERIC Educational Resources Information Center

Cappellini, Marco; Azaoui, Brahim

2017-01-01

In our study we analyse how the same interactional dynamic is produced in two different pedagogical settings exploiting a desktop videoconference system. We propose to focus our attention on a specific type of conversational side sequence, known in the Francophone literature as sequences of normative evaluation. More particularly, we analyse data…
High prevalence of mutations affecting the splicing process in a Spanish cohort with autosomal dominant retinitis pigmentosa

PubMed Central

Ezquerra-Inchausti, Maitane; Barandika, Olatz; Anasagasti, Ander; Irigoyen, Cristina; López de Munain, Adolfo; Ruiz-Ederra, Javier

2017-01-01

Retinitis pigmentosa is the most frequent group of inherited retinal dystrophies. It is highly heterogeneous, with more than 80 disease-causing genes 27 of which are known to cause autosomal dominant RP (adRP), having been identified. In this study a total of 29 index cases were ascertained based on a family tree compatible with adRP. A custom panel of 31 adRP genes was analysed by targeted next-generation sequencing using the Ion PGM platform in combination with Sanger sequencing. This allowed us to detect putative disease-causing mutations in 14 out of the 29 (48.28%) families analysed. Remarkably, around 38% of all adRP cases analysed showed mutations affecting the splicing process, mainly due to mutations in genes coding for spliceosome factors (SNRNP200 and PRPF8) but also due to splice-site mutations in RHO. Twelve of the 14 mutations found had been reported previously and two were novel mutations found in PRPF8 in two unrelated patients. In conclusion, our results will lead to more accurate genetic counselling and will contribute to a better characterisation of the disease. In addition, they may have a therapeutic impact in the future given the large number of studies currently underway based on targeted RNA splicing for therapeutic purposes. PMID:28045043
Diversity, Physiochemical and Phylogenetic Analyses of Bacteria Isolated from Various Drinking Water Sources.

PubMed

Eid, Neveen H; Al Doghaither, Huda A; Kumosani, Taha A; Gull, Munazza

2017-01-01

To evaluate the indigenous bacterial strains of drinking water from the most commercial water types including bottled and filtered water that are currently used in Saudi Arabia. Thirty randomly selected commercial brands of bottled water were purchased from Saudi local markets. Moreover, samples from tap water and filtered water were collected in sterilized glass bottles and stored at 4°C. Biochemical analyses including pH, temperature, lactose fermentation test (LAC), indole test (IND), methyl red test (MR), Voges-Proskauer test (VP), urease test (URE), catalase test (CAT), aerobic and anaerobic test (Ae/An) were measured. Molecular identification and comparative sequence analyses were done by full length 16S rRNA gene sequences using gene bank databases and phylogenetic trees were constructed to see the closely related similarity index between bacterial strains. Among 30 water samples tested, 18 were found positive for bacterial growth. Molecular identification of four selected bacterial strains indicated the alarming presence of pathogenic bacteria Bacillus spp . in most common commercial types of drinking water used in Saudi Arabia. The lack of awareness about good sanitation, poor personal hygienic practices and failure of safe water management and supply are the important factors for poor drinking water quality in these sources, need to be addressed.
msgbsR: An R package for analysing methylation-sensitive restriction enzyme sequencing data.

PubMed

Mayne, Benjamin T; Leemaqz, Shalem Y; Buckberry, Sam; Rodriguez Lopez, Carlos M; Roberts, Claire T; Bianco-Miotto, Tina; Breen, James

2018-02-01

Genotyping-by-sequencing (GBS) or restriction-site associated DNA marker sequencing (RAD-seq) is a practical and cost-effective method for analysing large genomes from high diversity species. This method of sequencing, coupled with methylation-sensitive enzymes (often referred to as methylation-sensitive restriction enzyme sequencing or MRE-seq), is an effective tool to study DNA methylation in parts of the genome that are inaccessible in other sequencing techniques or are not annotated in microarray technologies. Current software tools do not fulfil all methylation-sensitive restriction sequencing assays for determining differences in DNA methylation between samples. To fill this computational need, we present msgbsR, an R package that contains tools for the analysis of methylation-sensitive restriction enzyme sequencing experiments. msgbsR can be used to identify and quantify read counts at methylated sites directly from alignment files (BAM files) and enables verification of restriction enzyme cut sites with the correct recognition sequence of the individual enzyme. In addition, msgbsR assesses DNA methylation based on read coverage, similar to RNA sequencing experiments, rather than methylation proportion and is a useful tool in analysing differential methylation on large populations. The package is fully documented and available freely online as a Bioconductor package ( https://bioconductor.org/packages/release/bioc/html/msgbsR.html ).
Comparative and genetic analysis of the four sequenced Paenibacillus polymyxa genomes reveals a diverse metabolism and conservation of genes relevant to plant-growth promotion and competitiveness.

PubMed

Eastman, Alexander W; Heinrichs, David E; Yuan, Ze-Chun

2014-10-03

Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P. polymyxa CR1 encodes the unique nitrogen fixation cluster found in other Paenibacillus sp. Our study revealed that genomic loci relevant to host interaction and ecological fitness are highly conserved within the P. polymyxa genomes analysed, despite variations in the accessory genome. This work suggets that plant-growth promotion by P. polymyxa is mediated largely through phytohormone production, increased nutrient availability and bio-control mechanisms. This study provides an in-depth understanding of the genome architecture of this species, thus facilitating future genetic engineering and applications in agriculture, industry and medicine. Furthermore, this study highlights the current gap in our understanding of complex plant biomass metabolism in Gram-positive bacteria.
The Specificity of Innate Immune Responses Is Enforced by Repression of Interferon Response Elements by NF-κB p50

PubMed Central

Cheng, Christine S.; Feldman, Kristyn E.; Lee, James; Verma, Shilpi; Huang, De-Bin; Huynh, Kim; Chang, Mikyoung; Ponomarenko, Julia V.; Sun, Shao-Cong; Benedict, Chris A.; Ghosh, Gourisankar; Hoffmann, Alexander

2011-01-01

The specific binding of transcription factors to cognate sequence elements is thought to be critical for the generation of specific gene expression programs. Members of the nuclear factor κB (NF-κB) and interferon (IFN) regulatory factor (IRF) transcription factor families bind to the κB site and the IFN response element (IRE), respectively, of target genes, and they are activated in macrophages after exposure to pathogens. However, how these factors produce pathogen-specific inflammatory and immune responses remains poorly understood. Combining top-down and bottom-up systems biology approaches, we have identified the NF-κB p50 homodimer as a regulator of IRF responses. Unbiased genome-wide expression and biochemical and structural analyses revealed that the p50 homodimer repressed a subset of IFN-inducible genes through a previously uncharacterized subclass of guanine-rich IRE (G-IRE) sequences. Mathematical modeling predicted that the p50 homodimer might enforce the stimulus specificity of composite promoters. Indeed, the production of the antiviral regulator IFN-β was rendered stimulus-specific by the binding of the p50 homodimer to the G-IRE–containing IFNβ enhancer to suppress cytotoxic IFN signaling. Specifically, a deficiency in p50 resulted in the inappropriate production of IFN-β in response to bacterial DNA sensed by Toll-like receptor 9. This role for the NF-κB p50 homodimer in enforcing the specificity of the cellular response to pathogens by binding to a subset of IRE sequences alters our understanding of how the NF-κB and IRF signaling systems cooperate to regulate antimicrobial immunity. PMID:21343618
High-resolution sedimentological and subsidence analysis of the Late Neogene, Pannonian Basin, Hungary

USGS Publications Warehouse

Juhasz, E.; Muller, P.; Toth-Makk, A.; Hamor, T.; Farkas-Bulla, J.; Suto-Szentai, M.; Phillips, R.L.; Ricketts, B.

1996-01-01

Detailed sedimentological and paleontological analyses were carried out on more than 13,000 m of core from ten boreholes in the Late Neogene sediments of the Pannonian Basin, Hungary. These data provide the basis for determining the character of high-order depositional cycles and their stacking patterns. In the Late Neogene sediments of the Pannonian Basin there are two third-order sequences: the Late Miocene and the Pliocene ones. The Miocene sequence shows a regressive, upward-coarsening trend. There are four distinguishable sedimentary units in this sequence: the basal transgressive, the lower aggradational, the progradational and the upper aggradational units. The Pliocene sequence is also of aggradational character. The progradation does not coincide in time in the wells within the basin. The character of the relative water-level curves is similar throughout the basin but shows only very faint similarity to the sea-level curve. Therefore, it is unlikely that eustasy played any significant role in the pattern of basin filling. Rather, the dominant controls were the rapidly changing basin subsidence and high sedimentation rates, together with possible climatic factors.
The complete mitochondrial genomes of three parasitic nematodes of birds: a unique gene order and insights into nematode phylogeny

PubMed Central

2013-01-01

Background Analyses of mitochondrial (mt) genome sequences in recent years challenge the current working hypothesis of Nematoda phylogeny proposed from morphology, ecology and nuclear small subunit rRNA gene sequences, and raise the need to sequence additional mt genomes for a broad range of nematode lineages. Results We sequenced the complete mt genomes of three Ascaridia species (family Ascaridiidae) that infest chickens, pigeons and parrots, respectively. These three Ascaridia species have an identical arrangement of mt genes to each other but differ substantially from other nematodes. Phylogenetic analyses of the mt genome sequences of the Ascaridia species, together with 62 other nematode species, support the monophylies of seven high-level taxa of the phylum Nematoda: 1) the subclass Dorylaimia; 2) the orders Rhabditida, Trichinellida and Mermithida; 3) the suborder Rhabditina; and 4) the infraorders Spiruromorpha and Oxyuridomorpha. Analyses of mt genome sequences, however, reject the monophylies of the suborders Spirurina and Tylenchina, and the infraorders Rhabditomorpha, Panagrolaimomorpha and Tylenchomorpha. Monophyly of the infraorder Ascaridomorpha varies depending on the methods of phylogenetic analysis. The Ascaridomorpha was more closely related to the infraorders Rhabditomorpha and Diplogasteromorpha (suborder Rhabditina) than they were to the other two infraorders of the Spirurina: Oxyuridorpha and Spiruromorpha. The closer relationship among Ascaridomorpha, Rhabditomorpha and Diplogasteromorpha was also supported by a shared common pattern of mitochondrial gene arrangement. Conclusions Analyses of mitochondrial genome sequences and gene arrangement has provided novel insights into the phylogenetic relationships among several major lineages of nematodes. Many lineages of nematodes, however, are underrepresented or not represented in these analyses. Expanding taxon sampling is necessary for future phylogenetic studies of nematodes with mt genome sequences. PMID:23800363
Genome-wide identification and analysis of the chicken basic helix-loop-helix factors.

PubMed

Liu, Wu-Yi; Zhao, Chun-Jiang

2010-01-01

Members of the basic helix-loop-helix (bHLH) family of transcription factors play important roles in a wide range of developmental processes. In this study, we conducted a genome-wide survey using the chicken (Gallus gallus) genomic database, and identified 104 bHLH sequences belonging to 42 gene families in an effort to characterize the chicken bHLH transcription factor family. Phylogenetic analyses revealed that chicken has 50, 21, 15, 4, 8, and 3 bHLH members in groups A, B, C, D, E, and F, respectively, while three members belonging to none of these groups were classified as ''orphans". A comparison between chicken and human bHLH repertoires suggested that both organisms have a number of lineage-specific bHLH members in the proteomes. Chromosome distribution patterns and phylogenetic analyses strongly suggest that the bHLH members should have arisen through gene duplication at an early date. Gene Ontology (GO) enrichment statistics showed 51 top GO annotations of biological processes counted in the frequency. The present study deepens our understanding of the chicken bHLH transcription factor family and provides much useful information for further studies using chicken as a model system.
Genomic investigation of a suspected outbreak of Legionella pneumophila ST82 reveals undetected heterogeneity by the present gold-standard methods, Denmark, July to November 2014

PubMed Central

Schjørring, Susanne; Stegger, Marc; Kjelsø, Charlotte; Lilje, Berit; Bangsborg, Jette M; Petersen, Randi F; David, Sophia; Uldum, Søren A

2017-01-01

Between July and November 2014, 15 community-acquired cases of Legionnaires´ disease (LD), including four with Legionella pneumophila serogroup 1 sequence type (ST) 82, were diagnosed in Northern Zealand, Denmark. An outbreak was suspected. No ST82 isolates were found in environmental samples and no external source was established. Four putative-outbreak ST82 isolates were retrospectively subjected to whole genome sequencing (WGS) followed by phylogenetic analyses with epidemiologically unrelated ST82 sequences. The four putative-outbreak ST82 sequences fell into two clades, the two clades were separated by ca 1,700 single nt polymorphisms (SNP)s when recombination regions were included but only by 12 to 21 SNPs when these were removed. A single putative-outbreak ST82 isolate sequence segregated in the first clade. The other three clustered in the second clade, where all included sequences had < 5 SNP differences between them. Intriguingly, this clade also comprised epidemiologically unrelated isolate sequences from the UK and Denmark dating back as early as 2011. The study confirms that recombination plays a major role in L. pneumophila evolution. On the other hand, strains belonging to the same ST can have only few SNP differences despite being sampled over both large timespans and geographic distances. These are two important factors to consider in outbreak investigations. PMID:28662761
Characterization of Dermanyssus gallinae (Acarina: Dermanissydae) by sequence analysis of the ribosomal internal transcribed spacer regions.

PubMed

Potenza, L; Cafiero, M A; Camarda, A; La Salandra, G; Cucchiarini, L; Dachà, M

2009-10-01

In the present work mites previously identified as Dermanyssus gallinae De Geer (Acari, Mesostigmata) using morphological keys were investigated by molecular tools. The complete internal transcribed spacer 1 (ITS1), 5.8S ribosomal DNA, and ITS2 region of the ribosomal DNA from mites were amplified and sequenced to examine the level of sequence variations and to explore the feasibility of using this region in the identification of this mite. Conserved primers located at the 3'end of 18S and at the 5'start of 28S rRNA genes were used first, and amplified fragments were sequenced. Sequence analyses showed no variation in 5.8S and ITS2 region while slight intraspecific variations involving substitutions as well as deletions concentrated in the ITS1 region. Based on the sequence analyses a nested PCR of the ITS2 region followed by RFLP analyses has been set up in the attempt to provide a rapid molecular diagnostic tool of D. gallinae.
Pathogenesis of Helicobacter pylori-Related Gastroduodenal Diseases from Molecular Epidemiological Studies

PubMed Central

Yamaoka, Yoshio

2012-01-01

Helicobacter pylori is a major human pathogen that infects the stomach and produces inflammation that is responsible for various gastroduodenal diseases. Despite the high prevalence of H. pylori infections in Africa and South Asia, the incidence of gastric cancer in these areas is much lower than in other countries. The incidence of gastric cancer also tends to decrease from north to south in East Asia. Data from molecular epidemiological studies show that this variation in different geographic areas could be explained in part by different types of H. pylori virulence factors, especially CagA, VacA, and OipA. H. pylori infection is thought to be involved in both gastric cancer and duodenal ulcer, which are at opposite ends of the disease spectrum. This discrepancy can also be explained in part by another H. pylori factor, DupA, as well as by CagA typing (East Asian type versus Western type). H. pylori has a genome of approximately 1,600 genes; therefore, there might be other novel virulence factors. Because genome wide analyses using whole-genome sequencing technology give a broad view of the genome of H. pylori, we hope that next-generation sequencers will enable us to efficiently investigate novel virulence factors. PMID:22829807
The spread of hepatitis C virus genotype 1a in North America: a retrospective phylogenetic study.

PubMed

Joy, Jeffrey B; McCloskey, Rosemary M; Nguyen, Thuy; Liang, Richard H; Khudyakov, Yury; Olmstead, Andrea; Krajden, Mel; Ward, John W; Harrigan, P Richard; Montaner, Julio S G; Poon, Art F Y

2016-06-01

The timing of the initial spread of hepatitis C virus genotype 1a in North America is controversial. In particular, how and when hepatitis C virus reached extraordinary prevalence in specific demographic groups remains unclear. We quantified, using all available hepatitis C virus sequence data and phylodynamic methods, the timing of the spread of hepatitis C virus genotype 1a in North America. We screened 45 316 publicly available sequences of hepatitis C virus genotype 1a for location and genotype, and then did phylogenetic analyses of available North American sequences from five hepatitis C virus genes (E1, E2, NS2, NS4B, NS5B), with an emphasis on including as many sequences with early collection dates as possible. We inferred the historical population dynamics of this epidemic for all five gene regions using Bayesian skyline plots. Most of the spread of genotype 1a in North America occurred before 1965, and the hepatitis C virus epidemic has undergone relatively little expansion since then. The effective population size of the North American epidemic stabilised around 1960. These results were robust across all five gene regions analysed, although analyses of each gene separately show substantial variation in estimates of the timing of the early exponential growth, ranging roughly from 1940 for NS2, to 1965 for NS4B. The expansion of genotype 1a before 1965 suggests that nosocomial or iatrogenic factors rather than past sporadic behavioural risk (ie, experimentation with injection drug use, unsafe tattooing, high risk sex, travel to high endemic areas) were key contributors to the hepatitis C virus epidemic in North America. Our results might reduce stigmatisation around screening and diagnosis, potentially increasing rates of screening and treatment for hepatitis C virus. The Canadian Institutes of Health Research, Michael Smith Foundation for Health Research, and BC Centre for Excellence in HIV/AIDS. Copyright © 2016 Elsevier Ltd. All rights reserved.
5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

NASA Technical Reports Server (NTRS)

Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

1989-01-01

The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.
A sequential factorial analysis approach to characterize the effects of uncertainties for supporting air quality management

NASA Astrophysics Data System (ADS)

Wang, S.; Huang, G. H.; Veawab, A.

2013-03-01

This study proposes a sequential factorial analysis (SFA) approach for supporting regional air quality management under uncertainty. SFA is capable not only of examining the interactive effects of input parameters, but also of analyzing the effects of constraints. When there are too many factors involved in practical applications, SFA has the advantage of conducting a sequence of factorial analyses for characterizing the effects of factors in a systematic manner. The factor-screening strategy employed in SFA is effective in greatly reducing the computational effort. The proposed SFA approach is applied to a regional air quality management problem for demonstrating its applicability. The results indicate that the effects of factors are evaluated quantitatively, which can help decision makers identify the key factors that have significant influence on system performance and explore the valuable information that may be veiled beneath their interrelationships.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses

PubMed Central

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-01-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

PubMed

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-06-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
DMINDA: an integrated web server for DNA motif identification and analyses.

PubMed

Ma, Qin; Zhang, Hanyuan; Mao, Xizeng; Zhou, Chuan; Liu, Bingqiang; Chen, Xin; Xu, Ying

2014-07-01

DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

PubMed

Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

1997-11-01

A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

From clinical sample to complete genome: Comparing methods for the extraction of HIV-1 RNA for high-throughput deep sequencing.

PubMed

Cornelissen, Marion; Gall, Astrid; Vink, Monique; Zorgdrager, Fokla; Binter, Špela; Edwards, Stephanie; Jurriaans, Suzanne; Bakker, Margreet; Ong, Swee Hoe; Gras, Luuk; van Sighem, Ard; Bezemer, Daniela; de Wolf, Frank; Reiss, Peter; Kellam, Paul; Berkhout, Ben; Fraser, Christophe; van der Kuyl, Antoinette C

2017-07-15

The BEEHIVE (Bridging the Evolution and Epidemiology of HIV in Europe) project aims to analyse nearly-complete viral genomes from >3000 HIV-1 infected Europeans using high-throughput deep sequencing techniques to investigate the virus genetic contribution to virulence. Following the development of a computational pipeline, including a new de novo assembler for RNA virus genomes, to generate larger contiguous sequences (contigs) from the abundance of short sequence reads that characterise the data, another area that determines genome sequencing success is the quality and quantity of the input RNA. A pilot experiment with 125 patient plasma samples was performed to investigate the optimal method for isolation of HIV-1 viral RNA for long amplicon genome sequencing. Manual isolation with the QIAamp Viral RNA Mini Kit (Qiagen) was superior over robotically extracted RNA using either the QIAcube robotic system, the mSample Preparation Systems RNA kit with automated extraction by the m2000sp system (Abbott Molecular), or the MagNA Pure 96 System in combination with the MagNA Pure 96 Instrument (Roche Diagnostics). We scored amplification of a set of four HIV-1 amplicons of ∼1.9, 3.6, 3.0 and 3.5kb, and subsequent recovery of near-complete viral genomes. Subsequently, 616 BEEHIVE patient samples were analysed to determine factors that influence successful amplification of the genome in four overlapping amplicons using the QIAamp Viral RNA Kit for viral RNA isolation. Both low plasma viral load and high sample age (stored before 1999) negatively influenced the amplification of viral amplicons >3kb. A plasma viral load of >100,000 copies/ml resulted in successful amplification of all four amplicons for 86% of the samples, this value dropped to only 46% for samples with viral loads of <20,000 copies/ml. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Severe Hemophilia A in a Male Old English Sheep Dog with a C→T Transition that Created a Premature Stop Codon in Factor VIII

PubMed Central

Lozier, Jay N; Kloos, Mark T; Merricks, Elizabeth P; Lemoine, Nathaly; Whitford, Margaret H; Raymer, Robin A; Bellinger, Dwight A; Nichols, Timothy C

2016-01-01

Animals with hemophilia are models for gene therapy, factor replacement, and inhibitor development in humans. We have actively sought dogs with severe hemophilia A that have novel factor VIII mutations unlike the previously described factor VIII intron 22 inversion. A male Old English Sheepdog with recurrent soft-tissue hemorrhage and hemarthrosis was diagnosed with severe hemophilia A (factor VIII activity less than 1% of normal). We purified genomic DNA from this dog and ruled out the common intron 22 inversion; we then sequenced all 26 exons. Comparing the results with the normal canine factor VIII sequence revealed a C→T transition in exon 12 of the factor VIII gene that created a premature stop codon at amino acid 577 in the A2 domain of the protein. In addition, 2 previously described polymorphisms that do not cause hemophilia were present at amino acids 909 and 1184. The hemophilia mutation creates a new TaqI site that facilitates rapid genotyping of affected offspring by PCR and restriction endonuclease analyses. This mutation is analogous to the previously described human factor VIII mutation at Arg583, which likewise is a CpG dinucleotide transition causing a premature stop codon in exon 12. Thus far, despite extensive treatment with factor VIII, this dog has not developed neutralizing antibodies (‘inhibitors’) to the protein. This novel mutation in a dog gives rise to severe hemophilia A analogous to a mutation seen in humans. This model will be useful for studies of the treatment of hemophilia. PMID:27780008
Phylogenetic Copy-Number Factorization of Multiple Tumor Samples.

PubMed

Zaccaria, Simone; El-Kebir, Mohammed; Klau, Gunnar W; Raphael, Benjamin J

2018-04-16

Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.
Proteomic analysis of the phytopathogenic soilborne fungus Verticillium dahliae reveals differential protein expression in isolates that differ in aggressiveness.

PubMed

El-Bebany, Ahmed F; Rampitsch, Christof; Daayf, Fouad

2010-01-01

Verticillium dahliae is a soilborne fungus that causes a vascular wilt disease of plants and losses in a broad range of economically important crops worldwide. In this study, we compared the proteomes of highly (Vd1396-9) and weakly (Vs06-14) aggressive isolates of V. dahliae to identify protein factors that may contribute to pathogenicity. Twenty-five protein spots were consistently observed as differential in the proteome profiles of the two isolates. The protein sequences in the spots were identified by LC-ESI-MS/MS and MASCOT database searches. Some of the identified sequences shared homology with fungal proteins that have roles in stress response, colonization, melanin biosynthesis, microsclerotia formation, antibiotic resistance, and fungal penetration. These are important functions for infection of the host and survival of the pathogen in soil. One protein found only in the highly aggressive isolate was identified as isochorismatase hydrolase, a potential plant-defense suppressor. This enzyme may inhibit the production of salicylic acid, which is important for plant defense response signaling. Other sequences corresponding to potential pathogenicity factors were identified in the highly aggressive isolate. This work indicates that, in combination with functional genomics, proteomics-based analyses can provide additional insights into pathogenesis and potential management strategies for this disease.
The genetic landscape of paediatric de novo acute myeloid leukaemia as defined by single nucleotide polymorphism array and exon sequencing of 100 candidate genes.

PubMed

Olsson, Linda; Zettermark, Sofia; Biloglav, Andrea; Castor, Anders; Behrendtz, Mikael; Forestier, Erik; Paulsson, Kajsa; Johansson, Bertil

2016-07-01

Cytogenetic analyses of a consecutive series of 67 paediatric (median age 8 years; range 0-17) de novo acute myeloid leukaemia (AML) patients revealed aberrations in 55 (82%) cases. The most common subgroups were KMT2A rearrangement (29%), normal karyotype (15%), RUNX1-RUNX1T1 (10%), deletions of 5q, 7q and/or 17p (9%), myeloid leukaemia associated with Down syndrome (7%), PML-RARA (7%) and CBFB-MYH11 (5%). Single nucleotide polymorphism array (SNP-A) analysis and exon sequencing of 100 genes, performed in 52 and 40 cases, respectively (39 overlapping), revealed ≥1 aberration in 89%; when adding cytogenetic data, this frequency increased to 98%. Uniparental isodisomies (UPIDs) were detected in 13% and copy number aberrations (CNAs) in 63% (median 2/case); three UPIDs and 22 CNAs were recurrent. Twenty-two genes were targeted by focal CNAs, including AEBP2 and PHF6 deletions and genes involved in AML-associated gene fusions. Deep sequencing identified mutations in 65% of cases (median 1/case). In total, 60 mutations were found in 30 genes, primarily those encoding signalling proteins (47%), transcription factors (25%), or epigenetic modifiers (13%). Twelve genes (BCOR, CEBPA, FLT3, GATA1, KIT, KRAS, NOTCH1, NPM1, NRAS, PTPN11, SMC3 and TP53) were recurrently mutated. We conclude that SNP-A and deep sequencing analyses complement the cytogenetic diagnosis of paediatric AML. © 2016 John Wiley & Sons Ltd.
Combined molecular and morphological phylogenetic analyses of the New Zealand wolf spider genus Anoteropsis (Araneae: Lycosidae).

PubMed

Vink, Cor J; Paterson, Adrian M

2003-09-01

Datasets from the mitochondrial gene regions NADH dehydrogenase subunit I (ND1) and cytochrome c oxidase subunit I (COI) of the 20 species in the New Zealand wolf spider (Lycosidae) genus Anoteropsis were generated. Sequence data were phylogenetically analysed using parsimony and maximum likelihood analyses. The phylogenies generated from the ND1 and COI sequence data and a previously generated morphological dataset were significantly congruent (p<0.001). Sequence data were combined with morphological data and phylogenetically analysed using parsimony. The ND1 region sequenced included part of tRNA(Leu(CUN)), which appears to have an unstable amino-acyl arm and no TpsiC arm in lycosids. Analyses supported the existence of five species groups within Anoteropsis and the monophyly of species represented by multiple samples. A radiation of Anoteropsis species within the last five million years is inferred from the ND1 and COI likelihood phylograms, habitat and geological data, which also indicates that Anoteropsis arrived in New Zealand some time after it separated from Gondwana.
In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

PubMed Central

Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

2008-01-01

Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319
“Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes”

PubMed Central

Neafsey, Daniel E.; Waterhouse, Robert M.; Abai, Mohammad R.; Aganezov, Sergey S.; Alekseyev, Max A.; Allen, James E.; Amon, James; Arcà, Bruno; Arensburger, Peter; Artemov, Gleb; Assour, Lauren A.; Basseri, Hamidreza; Berlin, Aaron; Birren, Bruce W.; Blandin, Stephanie A.; Brockman, Andrew I.; Burkot, Thomas R.; Burt, Austin; Chan, Clara S.; Chauve, Cedric; Chiu, Joanna C.; Christensen, Mikkel; Costantini, Carlo; Davidson, Victoria L.M.; Deligianni, Elena; Dottorini, Tania; Dritsou, Vicky; Gabriel, Stacey B.; Guelbeogo, Wamdaogo M.; Hall, Andrew B.; Han, Mira V.; Hlaing, Thaung; Hughes, Daniel S.T.; Jenkins, Adam M.; Jiang, Xiaofang; Jungreis, Irwin; Kakani, Evdoxia G.; Kamali, Maryam; Kemppainen, Petri; Kennedy, Ryan C.; Kirmitzoglou, Ioannis K.; Koekemoer, Lizette L.; Laban, Njoroge; Langridge, Nicholas; Lawniczak, Mara K.N.; Lirakis, Manolis; Lobo, Neil F.; Lowy, Ernesto; MacCallum, Robert M.; Mao, Chunhong; Maslen, Gareth; Mbogo, Charles; McCarthy, Jenny; Michel, Kristin; Mitchell, Sara N.; Moore, Wendy; Murphy, Katherine A.; Naumenko, Anastasia N.; Nolan, Tony; Novoa, Eva M.; O'Loughlin, Samantha; Oringanje, Chioma; Oshaghi, Mohammad A.; Pakpour, Nazzy; Papathanos, Philippos A.; Peery, Ashley N.; Povelones, Michael; Prakash, Anil; Price, David P.; Rajaraman, Ashok; Reimer, Lisa J.; Rinker, David C.; Rokas, Antonis; Russell, Tanya L.; Sagnon, N'Fale; Sharakhova, Maria V.; Shea, Terrance; Simão, Felipe A.; Simard, Frederic; Slotman, Michel A.; Somboon, Pradya; Stegniy, Vladimir; Struchiner, Claudio J.; Thomas, Gregg W.C.; Tojo, Marta; Topalis, Pantelis; Tubio, José M.C.; Unger, Maria F.; Vontas, John; Walton, Catherine; Wilding, Craig S.; Willis, Judith H.; Wu, Yi-Chieh; Yan, Guiyun; Zdobnov, Evgeny M.; Zhou, Xiaofan; Catteruccia, Flaminia; Christophides, George K.; Collins, Frank H.; Cornman, Robert S.; Crisanti, Andrea; Donnelly, Martin J.; Emrich, Scott J.; Fontaine, Michael C.; Gelbart, William; Hahn, Matthew W.; Hansen, Immo A.; Howell, Paul I.; Kafatos, Fotis C.; Kellis, Manolis; Lawson, Daniel; Louis, Christos; Luckhart, Shirley; Muskavitch, Marc A.T.; Ribeiro, José M.; Riehle, Michael A.; Sharakhov, Igor V.; Tu, Zhijian; Zwiebel, Laurence J.; Besansky, Nora J.

2015-01-01

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover, but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts. PMID:25554792
Vertical Distribution of Bacterial Communities in the Indian Ocean as Revealed by Analyses of 16S rRNA and nasA Genes.

PubMed

Jiang, Xuexia; Jiao, Nianzhi

2016-09-01

Bacteria play an important role in the marine biogeochemical cycles. However, research on the bacterial community structure of the Indian Ocean is scarce, particularly within the vertical dimension. In this study, we investigated the bacterial diversity of the pelagic, mesopelagic and bathypelagic zones of the southwestern Indian Ocean (50.46°E, 37.71°S). The clone libraries constructed by 16S rRNA gene sequence revealed that most phylotypes retrieved from the Indian Ocean were highly divergent from those retrieved from other oceans. Vertical differences were observed based on the analysis of natural bacterial community populations derived from the 16S rRNA gene sequences. Based on the analysis of the nasA gene sequences from GenBank database, a pair of general primers was developed and used to amplify the bacterial nitrate-assimilating populations. Environmental factors play an important role in mediating the bacterial communities in the Indian Ocean revealed by canonical correlation analysis.
A sequence analysis of patterns in self-harm in young people with and without experience of being looked after in care.

PubMed

Wadman, Ruth; Clarke, David; Sayal, Kapil; Armstrong, Marie; Harroe, Caroline; Majumder, Pallab; Vostanis, Panos; Townsend, Ellen

2017-11-01

Young people in the public care system ('looked-after' young people) have high levels of self-harm. This paper reports the first detailed study of factors leading to self-harm over time in looked-after young people in England, using sequence analyses of the Card Sort Task for Self-harm (CaTS). Young people in care (looked-after group: n = 24; 14-21 years) and young people who had never been in care (contrast group: n = 21; 13-21 years) completed the CaTS, describing sequences of factors leading to their first and most recent episodes of self-harm. Lag sequential analysis determined patterns of significant transitions between factors (thoughts, feelings, behaviours, events) leading to self-harm across 6 months. Young people in care reported feeling better immediately following their first episode of self-harm. However, fearlessness of death, impulsivity, and access to means were reported most proximal to recent self-harm. Although difficult negative emotions were salient to self-harm sequences in both groups, young people with no experience of being in care reported a greater range of negative emotions and transitions between them. For the contrast group, feelings of depression and sadness were a significant starting point of the self-harm sequence 6 months prior to most recent self-harm. Sequences of factors leading to self-harm can change and evolve over time, so regular monitoring and assessment of each self-harm episode are needed. Support around easing and dealing with emotional distress is required. Restricting access to means to carry out potentially fatal self-harm attempts, particularly for the young persons with experience of being in care, is recommended. Self-harm (and factors associated with self-harm) can change and evolve over time; assessments need to reflect this. Looked-after young people reported feeling better after first self-harm; fearlessness of death, access to means, and impulsivity were reported as key in recent self-harm. Underlying emotional distress, particularly depression and self-hatred were important in both first and most recent self-harm. Looked-after young people should undergo regular monitoring and assessment of each self-harm episode and access to potentially fatal means should be restricted. The CaTS would have clinical utility as an assessment tool Recruiting participants can be a significant challenge in studies with looked-after children and young people. Future research with larger clinical samples would be valuable. © 2017 The British Psychological Society.
Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing.

PubMed

Gao, Shuai; Yan, Liying; Wang, Rui; Li, Jingyun; Yong, Jun; Zhou, Xin; Wei, Yuan; Wu, Xinglong; Wang, Xiaoye; Fan, Xiaoying; Yan, Jie; Zhi, Xu; Gao, Yun; Guo, Hongshan; Jin, Xiao; Wang, Wendong; Mao, Yunuo; Wang, Fengchao; Wen, Lu; Fu, Wei; Ge, Hao; Qiao, Jie; Tang, Fuchou

2018-06-01

The development of the digestive tract is critical for proper food digestion and nutrient absorption. Here, we analyse the main organs of the digestive tract, including the oesophagus, stomach, small intestine and large intestine, from human embryos between 6 and 25 weeks of gestation as well as the large intestine from adults using single-cell RNA-seq analyses. In total, 5,227 individual cells are analysed and 40 cell types clearly identified. Their crucial biological features, including developmental processes, signalling pathways, cell cycle, nutrient digestion and absorption metabolism, and transcription factor networks, are systematically revealed. Moreover, the differentiation and maturation processes of the large intestine are thoroughly investigated by comparing the corresponding transcriptome profiles between embryonic and adult stages. Our work offers a rich resource for investigating the gene regulation networks of the human fetal digestive tract and adult large intestine at single-cell resolution.
Formation of mushrooms and lignocellulose degradation encoded in the genome sequence of Schizophyllum commune

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ohm, Robin A.; de Jong, Jan F.; Lugones, Luis G.

2010-07-12

The wood degrading fungus Schizophyllum commune is a model system for mushroom development. Here, we describe the 38.5 Mb assembled genome of this basidiomycete and application of whole genome expression analysis to study the 13,210 predicted genes. Comparative analyses of the S. commune genome revealed unique wood degrading machinery and mating type loci with the highest number of reported genes. Gene expression analyses revealed that one third of the 471 identified transcription factor genes were differentially expressed during sexual development. Two of these transcription factor genes were deleted. Inactivation of fst4 resulted in the inability to form mushrooms, whereas inactivationmore » of fst3 resulted in more but smaller mushrooms than wild-type. These data illustrate that mechanisms underlying mushroom formation can be dissected using S. commune as a model. This will impact commercial production of mushrooms and the industrial use of these fruiting bodies to produce enzymes and pharmaceuticals.« less
Methods and compositions for regulating gene expression in plant cells

NASA Technical Reports Server (NTRS)

Dai, Shunhong (Inventor); Beachy, Roger N. (Inventor); Luis, Maria Isabel Ordiz (Inventor)

2010-01-01

Novel chimeric plant promoter sequences are provided, together with plant gene expression cassettes comprising such sequences. In certain preferred embodiments, the chimeric plant promoters comprise the BoxII cis element and/or derivatives thereof. In addition, novel transcription factors are provided, together with nucleic acid sequences encoding such transcription factors and plant gene expression cassettes comprising such nucleic acid sequences. In certain preferred embodiments, the novel transcription factors comprise the acidic domain, or fragments thereof, of the RF2a transcription factor. Methods for using the chimeric plant promoter sequences and novel transcription factors in regulating the expression of at least one gene of interest are provided, together with transgenic plants comprising such chimeric plant promoter sequences and novel transcription factors.
Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

PubMed Central

Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend

2007-01-01

A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995
Sockeye: A 3D Environment for Comparative Genomics

PubMed Central

Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

2004-01-01

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592
Genome-based identification of spliceosomal proteins in the silk moth Bombyx mori.

PubMed

Somarelli, Jason A; Mesa, Annia; Fuller, Myron E; Torres, Jacqueline O; Rodriguez, Carol E; Ferrer, Christina M; Herrera, Rene J

2010-12-01

Pre-messenger RNA splicing is a highly conserved eukaryotic cellular function that takes place by way of a large, RNA-protein assembly known as the spliceosome. In the mammalian system, nearly 300 proteins associate with uridine-rich small nuclear (sn)RNAs to form this complex. Some of these splicing factors are ubiquitously present in the spliceosome, whereas others are involved only in the processing of specific transcripts. Several proteomics analyses have delineated the proteins of the spliceosome in several species. In this study, we mine multiple sequence data sets of the silk moth Bombyx mori in an attempt to identify the entire set of known spliceosomal proteins. Five data sets were utilized, including the 3X, 6X, and Build 2.0 genomic contigs as well as the expressed sequence tag and protein libraries. While homologs for 88% of vertebrate splicing factors were delineated in the Bombyx mori genome, there appear to be several spliceosomal polypeptides absent in Bombyx mori and seven additional insect species. This apparent increase in spliceosomal complexity in vertebrates may reflect the tissue-specific and developmental stage-specific alternative pre-mRNA splicing requirements in vertebrates. Phylogenetic analyses of 15 eukaryotic taxa using the core splicing factors suggest that the essential functional units of the pre-mRNA processing machinery have remained highly conserved from yeast to humans. The Sm and LSm proteins are the most conserved, whereas proteins of the U1 small nuclear ribonucleoprotein particle are the most divergent. These data highlight both the differential conservation and relative phylogenetic signals of the essential spliceosomal components throughout evolution. © 2010 Wiley Periodicals, Inc.
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

PubMed

Scheuch, Matthias; Höper, Dirk; Beer, Martin

2015-03-03

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Interim Reliability Evaluation Program: analysis of the Browns Ferry, Unit 1, nuclear plant. Main report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mays, S.E.; Poloski, J.P.; Sullivan, W.H.

1982-07-01

A probabilistic risk assessment (PRA) was made of the Browns Ferry, Unit 1, nuclear plant as part of the Nuclear Regulatory Commission's Interim Reliability Evaluation Program (IREP). Specific goals of the study were to identify the dominant contributors to core melt, develop a foundation for more extensive use of PRA methods, expand the cadre of experienced PRA practitioners, and apply procedures for extension of IREP analyses to other domestic light water reactors. Event tree and fault tree analyses were used to estimate the frequency of accident sequences initiated by transients and loss of coolant accidents. External events such as floods,more » fires, earthquakes, and sabotage were beyond the scope of this study and were, therefore, excluded. From these sequences, the dominant contributors to probable core melt frequency were chosen. Uncertainty and sensitivity analyses were performed on these sequences to better understand the limitations associated with the estimated sequence frequencies. Dominant sequences were grouped according to common containment failure modes and corresponding release categories on the basis of comparison with analyses of similar designs rather than on the basis of detailed plant-specific calculations.« less
Functional Analysis of Maize Silk-Specific ZmbZIP25 Promoter.

PubMed

Li, Wanying; Yu, Dan; Yu, Jingjuan; Zhu, Dengyun; Zhao, Qian

2018-03-12

ZmbZIP25 ( Zea mays bZIP (basic leucine zipper) transcription factor 25) is a function-unknown protein that belongs to the D group of the bZIP transcription factor family. RNA-seq data showed that the expression of ZmbZIP25 was tissue-specific in maize silks, and this specificity was confirmed by RT-PCR (reverse transcription-polymerase chain reaction). In situ RNA hybridization showed that ZmbZIP25 was expressed exclusively in the xylem of maize silks. A 5' RACE (rapid amplification of cDNA ends) assay identified an adenine residue as the transcription start site of the ZmbZIP25 gene. To characterize this silk-specific promoter, we isolated and analyzed a 2450 bp (from -2083 to +367) and a 2600 bp sequence of ZmbZIP25 (from -2083 to +517, the transcription start site was denoted +1). Stable expression assays in Arabidopsis showed that the expression of the reporter gene GUS driven by the 2450 bp ZmbZIP25 5'-flanking fragment occurred exclusively in the papillae of Arabidopsis stigmas. Furthermore, transient expression assays in maize indicated that GUS and GFP expression driven by the 2450 bp ZmbZIP25 5'-flanking sequences occurred only in maize silks and not in other tissues. However, no GUS or GFP expression was driven by the 2600 bp ZmbZIP25 5'-flanking sequences in either stable or transient expression assays. A series of deletion analyses of the 2450 bp ZmbZIP25 5'-flanking sequence was performed in transgenic Arabidopsis plants, and probable elements prediction analysis revealed the possible presence of negative regulatory elements within the 161 bp region from -1117 to -957 that were responsible for the specificity of the ZmbZIP25 5'-flanking sequence.
Functional Analysis of Maize Silk-Specific ZmbZIP25 Promoter

PubMed Central

Li, Wanying; Yu, Dan; Yu, Jingjuan; Zhu, Dengyun; Zhao, Qian

2018-01-01

ZmbZIP25 (Zea mays bZIP (basic leucine zipper) transcription factor 25) is a function-unknown protein that belongs to the D group of the bZIP transcription factor family. RNA-seq data showed that the expression of ZmbZIP25 was tissue-specific in maize silks, and this specificity was confirmed by RT-PCR (reverse transcription-polymerase chain reaction). In situ RNA hybridization showed that ZmbZIP25 was expressed exclusively in the xylem of maize silks. A 5′ RACE (rapid amplification of cDNA ends) assay identified an adenine residue as the transcription start site of the ZmbZIP25 gene. To characterize this silk-specific promoter, we isolated and analyzed a 2450 bp (from −2083 to +367) and a 2600 bp sequence of ZmbZIP25 (from −2083 to +517, the transcription start site was denoted +1). Stable expression assays in Arabidopsis showed that the expression of the reporter gene GUS driven by the 2450 bp ZmbZIP25 5′-flanking fragment occurred exclusively in the papillae of Arabidopsis stigmas. Furthermore, transient expression assays in maize indicated that GUS and GFP expression driven by the 2450 bp ZmbZIP25 5′-flanking sequences occurred only in maize silks and not in other tissues. However, no GUS or GFP expression was driven by the 2600 bp ZmbZIP25 5′-flanking sequences in either stable or transient expression assays. A series of deletion analyses of the 2450 bp ZmbZIP25 5′-flanking sequence was performed in transgenic Arabidopsis plants, and probable elements prediction analysis revealed the possible presence of negative regulatory elements within the 161 bp region from −1117 to −957 that were responsible for the specificity of the ZmbZIP25 5′-flanking sequence. PMID:29534529

Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder.

PubMed

Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E

2012-09-28

Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

PubMed Central

2012-01-01

Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
Defining objective clusters for rabies virus sequences using affinity propagation clustering

PubMed Central

Fischer, Susanne; Freuling, Conrad M.; Pfaff, Florian; Bodenhofer, Ulrich; Höper, Dirk; Fischer, Mareike; Marston, Denise A.; Fooks, Anthony R.; Mettenleiter, Thomas C.; Conraths, Franz J.; Homeier-Bachmann, Timo

2018-01-01

Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV), from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named ‘affinity propagation clustering’ (AP) to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516) and original (46) full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses. PMID:29357361
Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome.

PubMed

Kawaguchi, Risa; Kiryu, Hisanori

2016-05-06

RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems. Our novel software, "ParasoR", is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions. We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre-mRNAs, and long non-coding RNAs whose lengths can be more than a million bases in the human genome. In our analyses, transcribed regions including introns are indicated to be subject to various types of structural constraints that cannot be explained from simple sequence composition biases. ParasoR is freely available at https://github.com/carushi/ParasoR .
Acoustic sequences in non-human animals: a tutorial review and prospectus.

PubMed

Kershenbaum, Arik; Blumstein, Daniel T; Roch, Marie A; Akçay, Çağlar; Backus, Gregory; Bee, Mark A; Bohn, Kirsten; Cao, Yan; Carter, Gerald; Cäsar, Cristiane; Coen, Michael; DeRuiter, Stacy L; Doyle, Laurance; Edelman, Shimon; Ferrer-i-Cancho, Ramon; Freeberg, Todd M; Garland, Ellen C; Gustison, Morgan; Harley, Heidi E; Huetz, Chloé; Hughes, Melissa; Hyland Bruno, Julia; Ilany, Amiyaal; Jin, Dezhe Z; Johnson, Michael; Ju, Chenghui; Karnowski, Jeremy; Lohr, Bernard; Manser, Marta B; McCowan, Brenda; Mercado, Eduardo; Narins, Peter M; Piel, Alex; Rice, Megan; Salmi, Roberta; Sasahara, Kazutoshi; Sayigh, Laela; Shiu, Yu; Taylor, Charles; Vallejo, Edgar E; Waller, Sara; Zamora-Gutierrez, Veronica

2016-02-01

Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise - let alone understand - the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, 'Analysing vocal sequences in animals'. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality. © 2014 Cambridge Philosophical Society.
Acoustic sequences in non-human animals: a tutorial review and prospectus

PubMed Central

Kershenbaum, Arik; Blumstein, Daniel T.; Roch, Marie A.; Akçay, Çağlar; Backus, Gregory; Bee, Mark A.; Bohn, Kirsten; Cao, Yan; Carter, Gerald; Cäsar, Cristiane; Coen, Michael; DeRuiter, Stacy L.; Doyle, Laurance; Edelman, Shimon; Ferrer-i-Cancho, Ramon; Freeberg, Todd M.; Garland, Ellen C.; Gustison, Morgan; Harley, Heidi E.; Huetz, Chloé; Hughes, Melissa; Bruno, Julia Hyland; Ilany, Amiyaal; Jin, Dezhe Z.; Johnson, Michael; Ju, Chenghui; Karnowski, Jeremy; Lohr, Bernard; Manser, Marta B.; McCowan, Brenda; Mercado, Eduardo; Narins, Peter M.; Piel, Alex; Rice, Megan; Salmi, Roberta; Sasahara, Kazutoshi; Sayigh, Laela; Shiu, Yu; Taylor, Charles; Vallejo, Edgar E.; Waller, Sara; Zamora-Gutierrez, Veronica

2015-01-01

Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise – let alone understand – the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, “Analysing vocal sequences in animals”. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality. PMID:25428267
Fusarium proliferatum - Causal agent of garlic bulb rot in Spain: Genetic variability and mycotoxin production.

PubMed

Gálvez, Laura; Urbaniak, Monika; Waśkiewicz, Agnieszka; Stępień, Łukasz; Palmero, Daniel

2017-10-01

Fusarium proliferatum is a world-wide occurring fungal pathogen affecting several crops included garlic bulbs. In Spain, this is the most frequent pathogenic fungus associated with garlic rot during storage. Moreover, F. proliferatum is an important mycotoxigenic species, producing a broad range of toxins, which may pose a risk for food safety. The aim of this study is to assess the intraspecific variability of the garlic pathogen in Spain implied by analyses of translation elongation factor (tef-1α) and FUM1 gene sequences as well as the differences in growth rates. Phylogenetic characterization has been complemented with the characterization of mating type alleles as well as the species potential as a toxin producer. Phylogenetic trees based on the sequence of the translation elongation factor and FUM1 genes from seventy nine isolates from garlic revealed a considerable intraspecific variability as well as high level of diversity in growth speed. Based on the MAT alleles amplified by PCR, F. proliferatum isolates were separated into different groups on both trees. All isolates collected from garlic in Spain proved to be fumonisin B 1 , B 2 , and B 3 producers. Quantitative analyses of fumonisins, beauvericin and moniliformin (common secondary metabolites of F. proliferatum) showed no correlation with phylogenetic analysis neither mycelial growth. This pathogen presents a high intraspecific variability within the same geographical region and host, which is necessary to be considered in the management of the disease. Copyright © 2017 Elsevier Ltd. All rights reserved.
DNA/RNA hybrid substrates modulate the catalytic activity of purified AID.

PubMed

Abdouni, Hala S; King, Justin J; Ghorbani, Atefeh; Fifield, Heather; Berghuis, Lesley; Larijani, Mani

2018-01-01

Activation-induced cytidine deaminase (AID) converts cytidine to uridine at Immunoglobulin (Ig) loci, initiating somatic hypermutation and class switching of antibodies. In vitro, AID acts on single stranded DNA (ssDNA), but neither double-stranded DNA (dsDNA) oligonucleotides nor RNA, and it is believed that transcription is the in vivo generator of ssDNA targeted by AID. It is also known that the Ig loci, particularly the switch (S) regions targeted by AID are rich in transcription-generated DNA/RNA hybrids. Here, we examined the binding and catalytic behavior of purified AID on DNA/RNA hybrid substrates bearing either random sequences or GC-rich sequences simulating Ig S regions. If substrates were made up of a random sequence, AID preferred substrates composed entirely of DNA over DNA/RNA hybrids. In contrast, if substrates were composed of S region sequences, AID preferred to mutate DNA/RNA hybrids over substrates composed entirely of DNA. Accordingly, AID exhibited a significantly higher affinity for binding DNA/RNA hybrid substrates composed specifically of S region sequences, than any other substrates composed of DNA. Thus, in the absence of any other cellular processes or factors, AID itself favors binding and mutating DNA/RNA hybrids composed of S region sequences. AID:DNA/RNA complex formation and supporting mutational analyses suggest that recognition of DNA/RNA hybrids is an inherent structural property of AID. Copyright © 2017 Elsevier Ltd. All rights reserved.
Barcoded NS31/AML2 primers for sequencing of arbuscular mycorrhizal communities in environmental samples1

PubMed Central

Morgan, Benjamin S. T.; Egerton-Warburton, Louise M.

2017-01-01

Premise of the study: Arbuscular mycorrhizal fungi (AMF) are globally important root symbioses that enhance plant growth and nutrition and influence ecosystem structure and function. To better characterize levels of AMF diversity relevant to ecosystem function, deeper sequencing depth in environmental samples is needed. In this study, Illumina barcoded primers and a bioinformatics pipeline were developed and applied to study AMF diversity and community structure in environmental samples. Methods: Libraries of small subunit ribosomal RNA fragment amplicons were amplified from environmental DNA using a single-step PCR reaction with barcoded NS31/AML2 primers. Amplicons were sequenced on an Illumina MiSeq sequencer using version 2, 2 × 250-bp paired-end chemistry, and analyzed using QIIME and RDP Classifier. Results: Sequencing captured 196 to 6416 operational taxonomic units (OTUs; depending on clustering parameters) representing nine AMF genera. Regardless of clustering parameters, ∼20 OTUs dominated AMF communities (78–87% reads) with the remaining reads distributed among other OTUs. Analyses also showed significant biogeographic differences in AMF communities and that community composition could be linked to specific edaphic factors. Discussion: Barcoded NS31/AML2 primers and Illumina MiSeq sequencing provide a powerful approach to address AMF diversity and variations in fungal assemblages across host plants, ecosystems, and responses to environmental drivers including global change. PMID:28924511
Deep Sequencing of the Medicago truncatula Root Transcriptome Reveals a Massive and Early Interaction between Nodulation Factor and Ethylene Signals1[OPEN

PubMed Central

Larrainzar, Estíbaliz; Riely, Brendan K.; Kim, Sang Cheol; Carrasquilla-Garcia, Noelia; Yu, Hee-Ju; Hwang, Hyun-Ju; Oh, Mijin; Kim, Goon Bo; Surendrarao, Anandkumar K.; Chasman, Deborah; Siahpirani, Alireza F.; Penmetsa, Ramachandra V.; Lee, Gang-Seob; Kim, Namshin; Roy, Sushmita; Mun, Jeong-Hwan; Cook, Douglas R.

2015-01-01

The legume-rhizobium symbiosis is initiated through the activation of the Nodulation (Nod) factor-signaling cascade, leading to a rapid reprogramming of host cell developmental pathways. In this work, we combine transcriptome sequencing with molecular genetics and network analysis to quantify and categorize the transcriptional changes occurring in roots of Medicago truncatula from minutes to days after inoculation with Sinorhizobium medicae. To identify the nature of the inductive and regulatory cues, we employed mutants with absent or decreased Nod factor sensitivities (i.e. Nodulation factor perception and Lysine motif domain-containing receptor-like kinase3, respectively) and an ethylene (ET)-insensitive, Nod factor-hypersensitive mutant (sickle). This unique data set encompasses nine time points, allowing observation of the symbiotic regulation of diverse biological processes with high temporal resolution. Among the many outputs of the study is the early Nod factor-induced, ET-regulated expression of ET signaling and biosynthesis genes. Coupled with the observation of massive transcriptional derepression in the ET-insensitive background, these results suggest that Nod factor signaling activates ET production to attenuate its own signal. Promoter:β-glucuronidase fusions report ET biosynthesis both in root hairs responding to rhizobium as well as in meristematic tissue during nodule organogenesis and growth, indicating that ET signaling functions at multiple developmental stages during symbiosis. In addition, we identified thousands of novel candidate genes undergoing Nod factor-dependent, ET-regulated expression. We leveraged the power of this large data set to model Nod factor- and ET-regulated signaling networks using MERLIN, a regulatory network inference algorithm. These analyses predict key nodes regulating the biological process impacted by Nod factor perception. We have made these results available to the research community through a searchable online resource. PMID:26175514
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

PubMed

Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

2017-01-01

Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Taxonomic and functional metagenomic profiling of gastrointestinal tract microbiome of the farmed adult turbot (Scophthalmus maximus).

PubMed

Xing, Mengxin; Hou, Zhanhui; Yuan, Jianbo; Liu, Yuan; Qu, Yanmei; Liu, Bin

2013-12-01

Metagenomics combined with 16S rRNA gene sequence analyses was applied to unveil the taxonomic composition and functional diversity of the farmed adult turbot gastrointestinal (GI) microbiome. Proteobacteria and Firmicutes which existed in both GI content and mucus were dominated in the turbot GI microbiome. 16S rRNA gene sequence analyses also indicated that the turbot GI tract may harbor some bacteria which originated from associated seawater. Functional analyses indicated that the clustering-based subsystem and many metabolic subsystems were dominant in the turbot GI metagenome. Compared with other gut metagenomes, quorum sensing and biofilm formation was overabundant in the turbot GI metagenome. Genes associated with quorum sensing and biofilm formation were found in species within Vibrio, including Vibrio vulnificus, Vibrio cholerae and Vibrio parahaemolyticus. In farmed fish gut metagenomes, the stress response and protein folding subsystems were over-represented and several genes concerning antibiotic and heavy metal resistance were also detected. These data suggested that the turbot GI microbiome may be affected by human factors in aquaculture. Additionally, iron acquisition and the metabolism subsystem were more abundant in the turbot GI metagenome when compared with freshwater fish gut metagenome, suggesting that unique metabolic potential may be observed in marine animal GI microbiomes. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.
From reads to regions: a Bioconductor workflow to detect differential binding in ChIP-seq data

PubMed Central

Lun, Aaron T. L.; Smyth, Gordon K.

2016-01-01

Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify the genomic binding sites for protein of interest. Most conventional approaches to ChIP-seq data analysis involve the detection of the absolute presence (or absence) of a binding site. However, an alternative strategy is to identify changes in the binding intensity between two biological conditions, i.e., differential binding (DB). This may yield more relevant results than conventional analyses, as changes in binding can be associated with the biological difference being investigated. The aim of this article is to facilitate the implementation of DB analyses, by comprehensively describing a computational workflow for the detection of DB regions from ChIP-seq data. The workflow is based primarily on R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, from alignment of read sequences to interpretation and visualization of putative DB regions. In particular, detection of DB regions will be conducted using the counts for sliding windows from the csaw package, with statistical modelling performed using methods in the edgeR package. Analyses will be demonstrated on real histone mark and transcription factor data sets. This will provide readers with practical usage examples that can be applied in their own studies. PMID:26834993
First genome report on novel sequence types of Neisseria meningitidis: ST12777 and ST12778.

PubMed

Veeraraghavan, Balaji; Lal, Binesh; Devanga Ragupathi, Naveen Kumar; Neeravi, Iyyan Raj; Jeyaraman, Ranjith; Varghese, Rosemol; Paul, Miracle Magdalene; Baskaran, Ashtawarthani; Ranjan, Ranjini

2018-03-01

Neisseria meningitidis is an important causative agent of meningitis and/or sepsis with high morbidity and mortality. Baseline genome data on N. meningitidis, especially from developing countries such as India, are lacking. This study aimed to investigate the whole genome sequences of N. meningitidis isolates from a tertiary care centre in India. Whole-genome sequencing was performed using an Ion Torrent™ Personal Genome Machine™ (PGM) with 400-bp chemistry. Data were assembled de novo using SPAdes Genome Assembler v.5.0.0.0. Sequence annotation was performed through PATRIC, RAST and the NCBI PGAAP server. Downstream analysis of the isolates was performed using the Center for Genomic Epidemiology databases for antimicrobial resistance genes and sequence types. Virulence factors and CRISPR were analysed using the PubMLST database and CRISPRFinder, respectively. This study reports the whole genome shotgun sequences of eight N. meningitidis isolates from bloodstream infections. The genome data revealed two novel sequence types (ST12777 and ST12778), along with ST11, ST437 and ST6928. The virulence profile of the isolates matched their sequence types. All isolates were negative for plasmid-mediated resistance genes. To the best of our knowledge, this is the first report of ST11 and ST437 N. meningitidis isolates in India along with two novel sequence types (ST12777 and ST12778). These results indicate that the sequence types circulating in India are diverse and require continuous monitoring. Further studies strengthening the genome data on N. meningitidis are required to understand the prevalence, spread, exact resistance and virulence mechanisms along with serotypes. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Generation and analysis of expressed sequence tags from the bone marrow of Chinese Sika deer.

PubMed

Yao, Baojin; Zhao, Yu; Zhang, Mei; Li, Juan

2012-03-01

Sika deer is one of the best-known and highly valued animals of China. Despite its economic, cultural, and biological importance, there has not been a large-scale sequencing project for Sika deer to date. With the ultimate goal of sequencing the complete genome of this organism, we first established a bone marrow cDNA library for Sika deer and generated a total of 2,025 reads. After processing the sequences, 2,017 high-quality expressed sequence tags (ESTs) were obtained. These ESTs were assembled into 1,157 unigenes, including 238 contigs and 919 singletons. Comparative analyses indicated that 888 (76.75%) of the unigenes had significant matches to sequences in the non-redundant protein database, In addition to highly expressed genes, such as stearoyl-CoA desaturase, cytochrome c oxidase, adipocyte-type fatty acid-binding protein, adiponectin and thymosin beta-4, we also obtained vascular endothelial growth factor-A and heparin-binding growth-associated molecule, both of which are of great importance for angiogenesis research. There were 244 (21.09%) unigenes with no significant match to any sequence in current protein or nucleotide databases, and these sequences may represent genes with unknown function in Sika deer. Open reading frame analysis of the sequences was performed using the getorf program. In addition, the sequences were functionally classified using the gene ontology hierarchy, clusters of orthologous groups of proteins and Kyoto encyclopedia of genes and genomes databases. Analysis of ESTs described in this paper provides an important resource for the transcriptome exploration of Sika deer, and will also facilitate further studies on functional genomics, gene discovery and genome annotation of Sika deer.
Hepatitis E virus and fulminant hepatitis--a virus or host-specific pathology?

PubMed

Smith, Donald B; Simmonds, Peter

2015-04-01

Fulminant hepatitis is a rare outcome of infection with hepatitis E virus. Several recent reports suggest that virus variation is an important determinant of disease progression. To critically examine the evidence that virus-specific factors underlie the development of fulminant hepatitis following hepatitis E virus infection. Published sequence information of hepatitis E virus isolates from patients with and without fulminant hepatitis was collected and analysed using statistical tests to identify associations between virus polymorphisms and disease outcome. Fulminant hepatitis has been reported following infection with all four hepatitis E virus genotypes that infect humans comprising multiple phylogenetic lineages within genotypes 1, 3 and 4. Analysis of virus sequences from individuals infected by a common source did not detect any common substitutions associated with progression to fulminant hepatitis. Re-analysis of previously reported associations between virus substitutions and fulminant hepatitis suggests that these were probably the result of sampling biases. Host-specific factors rather than virus genotype, variants or specific substitutions appear to be responsible for the development of fulminant hepatitis. © 2014 The Authors. Liver International Published by John Wiley & Sons Ltd.
Uncovering potential “herbal probiotics” in Juzen-taiho-to through the study of associated bacterial populations

PubMed Central

Montenegro, Diego; Kalpana, Kriti; Chrissian, Christine; Sharma, Ashutosh; Takaoka, Anna; Iacovidou, Maria; Soll, Clifford E.; Aminova, Olga; Heguy, Adriana; Cohen, Lisa; Shen, Steven

2014-01-01

Juzen-taiho-to (JTT) is an immune-boosting formulation of ten medicinal herbs. It is used clinically in East Asia to boost the human immune functions. The active factors in JTT have not been clarified. But, existing evidence suggests that lipopolysaccharide (LPS)-like factors contribute to the activity. To examine this possibility, JTT was subjected to a series of analyses, including high resolution mass spectrometry, which suggested the presence of structural variants of LPS. This finding opened a possibility that JTT contains immune-boosting bacteria. As the first step to characterize the bacteria in JTT, 16S ribosomal RNA sequencing was carried out for Angelica sinensis (dried root), one of the most potent immunostimulatory herbs in JTT. The sequencing revealed a total of 519 bacteria genera in A. sinensis. The most abundant genus was Rahnella, which is widely distributed in water and plants. The abundance of Rahnella appeared to correlate with the immunostimulatory activity of A. sinensis. In conclusion, the current study provided new pieces of evidence supporting the emerging theory of bacterial contribution in immune-boosting herbs. PMID:25547935
gyrB as a phylogenetic discriminator for members of the Bacillus anthracis-cereus-thuringiensis group

NASA Technical Reports Server (NTRS)

La Duc, Myron T.; Satomi, Masataka; Agata, Norio; Venkateswaran, Kasthuri

2004-01-01

Bacillus anthracis, the causative agent of the human disease anthrax, Bacillus cereus, a food-borne pathogen capable of causing human illness, and Bacillus thuringiensis, a well-characterized insecticidal toxin producer, all cluster together within a very tight clade (B. cereus group) phylogenetically and are indistinguishable from one another via 16S rDNA sequence analysis. As new pathogens are continually emerging, it is imperative to devise a system capable of rapidly and accurately differentiating closely related, yet phenotypically distinct species. Although the gyrB gene has proven useful in discriminating closely related species, its sequence analysis has not yet been validated by DNA:DNA hybridization, the taxonomically accepted "gold standard". We phylogenetically characterized the gyrB sequences of various species and serotypes encompassed in the "B. cereus group," including lab strains and environmental isolates. Results were compared to those obtained from analyses of phenotypic characteristics, 16S rDNA sequence, DNA:DNA hybridization, and virulence factors. The gyrB gene proved more highly differential than 16S, while, at the same time, as analytical as costly and laborious DNA:DNA hybridization techniques in differentiating species within the B. cereus group.
Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host.

PubMed

Palesch, David; Bosinger, Steven E; Tharp, Gregory K; Vanderford, Thomas H; Paiardini, Mirko; Chahroudi, Ann; Johnson, Zachary P; Kirchhoff, Frank; Hahn, Beatrice H; Norgren, Robert B; Patel, Nirav B; Sodora, Donald L; Dawoud, Reem A; Stewart, Caro-Beth; Seepo, Sara M; Harris, R Alan; Liu, Yue; Raveendran, Muthuswamy; Han, Yi; English, Adam; Thomas, Gregg W C; Hahn, Matthew W; Pipes, Lenore; Mason, Christopher E; Muzny, Donna M; Gibbs, Richard A; Sauter, Daniel; Worley, Kim; Rogers, Jeffrey; Silvestri, Guido

2018-01-03

In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3-4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.
Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host

PubMed Central

Palesch, David; Bosinger, Steven E.; Tharp, Gregory K.; Vanderford, Thomas H.; Paiardini, Mirko; Chahroudi, Ann; Johnson, Zachary P.; Kirchhoff, Frank; Hahn, Beatrice H.; Norgren, Robert B.; Patel, Nirav B.; Sodora, Donald L.; Dawoud, Reem A.; Stewart, Caro-Beth; Seepo, Sara M.; Harris, R. Alan; Liu, Yue; Raveendran, Muthuswamy; Han, Yi; English, Adam; Thomas, Gregg W. C.; Hahn, Matthew W.; Pipes, Lenore; Mason, Christopher E.; Muzny, Donna M.; Gibbs, Richard A.; Sauter, Daniel; Worley, Kim; Rogers, Jeffrey; Silvestri, Guido

2018-01-01

In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia1. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3–4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS. PMID:29300007

Differential principal component analysis of ChIP-seq.

PubMed

Ji, Hongkai; Li, Xia; Wang, Qian-fei; Ning, Yang

2013-04-23

We propose differential principal component analysis (dPCA) for analyzing multiple ChIP-sequencing datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single framework. It uses a small number of principal components to summarize concisely the major multiprotein synergistic differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a unique tool for efficiently analyzing large amounts of ChIP-sequencing data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential chromatin patterns at transcription factor binding sites and promoters as well as allele-specific protein-DNA interactions.
Barriopsis iraniana and Phaeobotryon cupressi: two new species of the Botryosphaeriaceae from trees in Iran.

PubMed

Abdollahzadeh, J; Mohammadi Goltapeh, E; Javadi, A; Shams-Bakhsh, M; Zare, R; Phillips, A J L

2009-12-01

Species in the Botryosphaeriaceae are well known as pathogens and saprobes of woody hosts, but little is known about the species that occur in Iran. In a recent survey of this family in Iran two fungi with diplodia-like anamorphs were isolated from various tree hosts. These two fungi were fully characterised in terms of morphology of the anamorphs in culture, and sequences of the ITS1/ITS2 regions of the ribosomal DNA operon and partial sequences of the translation elongation factor 1-alpha. Phylogenetic analyses placed them within a clade consisting of Barriopsis and Phaeobotryon species, but they were clearly distinct from known species in these genera. Therefore, they are described here as two new species, namely Barriopsis iraniana on Citrus, Mangifera and Olea, and Phaeobotryon cupressi on Cupressus sempervirens.
Identification of Blastocystis Subtype 1 Variants in the Home for Girls, Bangkok, Thailand

PubMed Central

Thathaisong, Umaporn; Siripattanapipong, Suradej; Mungthin, Mathirut; Pipatsatitpong, Duangnate; Tan-ariya, Peerapan; Naaglor, Tawee; Leelayoova, Saovanee

2013-01-01

A cross-sectional study of Blastocystis infection was conducted to evaluate the prevalence, risk factors, and subtypes of Blastocystis at the Home for Girls, Bangkok, Thailand in November 2008. Of 370 stool samples, 118 (31.9%) were infected with Blastocystis. Genotypic characterization of Blastocystis was performed by polymerase chain reaction and sequence analysis of the partial small subunit ribosomal RNA (SSU rRNA) gene. Subtype 1 was the most predominant (94.8%), followed by subtype 6 (3.5%) and subtype 2 (1.7%). Sequence analyses revealed nucleotide polymorphisms for Blastocystis subtype 1, which were described as subtype 1/variant 1, subtype 1/variant 2. Blastocystis subtype 1/variant 1 was the most predominant infection occurring in almost every house. The results showed that subtype analysis of Blastocystis was useful for molecular epidemiological study. PMID:23166199
Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes.

PubMed

Neafsey, Daniel E; Waterhouse, Robert M; Abai, Mohammad R; Aganezov, Sergey S; Alekseyev, Max A; Allen, James E; Amon, James; Arcà, Bruno; Arensburger, Peter; Artemov, Gleb; Assour, Lauren A; Basseri, Hamidreza; Berlin, Aaron; Birren, Bruce W; Blandin, Stephanie A; Brockman, Andrew I; Burkot, Thomas R; Burt, Austin; Chan, Clara S; Chauve, Cedric; Chiu, Joanna C; Christensen, Mikkel; Costantini, Carlo; Davidson, Victoria L M; Deligianni, Elena; Dottorini, Tania; Dritsou, Vicky; Gabriel, Stacey B; Guelbeogo, Wamdaogo M; Hall, Andrew B; Han, Mira V; Hlaing, Thaung; Hughes, Daniel S T; Jenkins, Adam M; Jiang, Xiaofang; Jungreis, Irwin; Kakani, Evdoxia G; Kamali, Maryam; Kemppainen, Petri; Kennedy, Ryan C; Kirmitzoglou, Ioannis K; Koekemoer, Lizette L; Laban, Njoroge; Langridge, Nicholas; Lawniczak, Mara K N; Lirakis, Manolis; Lobo, Neil F; Lowy, Ernesto; MacCallum, Robert M; Mao, Chunhong; Maslen, Gareth; Mbogo, Charles; McCarthy, Jenny; Michel, Kristin; Mitchell, Sara N; Moore, Wendy; Murphy, Katherine A; Naumenko, Anastasia N; Nolan, Tony; Novoa, Eva M; O'Loughlin, Samantha; Oringanje, Chioma; Oshaghi, Mohammad A; Pakpour, Nazzy; Papathanos, Philippos A; Peery, Ashley N; Povelones, Michael; Prakash, Anil; Price, David P; Rajaraman, Ashok; Reimer, Lisa J; Rinker, David C; Rokas, Antonis; Russell, Tanya L; Sagnon, N'Fale; Sharakhova, Maria V; Shea, Terrance; Simão, Felipe A; Simard, Frederic; Slotman, Michel A; Somboon, Pradya; Stegniy, Vladimir; Struchiner, Claudio J; Thomas, Gregg W C; Tojo, Marta; Topalis, Pantelis; Tubio, José M C; Unger, Maria F; Vontas, John; Walton, Catherine; Wilding, Craig S; Willis, Judith H; Wu, Yi-Chieh; Yan, Guiyun; Zdobnov, Evgeny M; Zhou, Xiaofan; Catteruccia, Flaminia; Christophides, George K; Collins, Frank H; Cornman, Robert S; Crisanti, Andrea; Donnelly, Martin J; Emrich, Scott J; Fontaine, Michael C; Gelbart, William; Hahn, Matthew W; Hansen, Immo A; Howell, Paul I; Kafatos, Fotis C; Kellis, Manolis; Lawson, Daniel; Louis, Christos; Luckhart, Shirley; Muskavitch, Marc A T; Ribeiro, José M; Riehle, Michael A; Sharakhov, Igor V; Tu, Zhijian; Zwiebel, Laurence J; Besansky, Nora J

2015-01-02

Variation in vectorial capacity for human malaria among Anopheles mosquito species is determined by many factors, including behavior, immunity, and life history. To investigate the genomic basis of vectorial capacity and explore new avenues for vector control, we sequenced the genomes of 16 anopheline mosquito species from diverse locations spanning ~100 million years of evolution. Comparative analyses show faster rates of gene gain and loss, elevated gene shuffling on the X chromosome, and more intron losses, relative to Drosophila. Some determinants of vectorial capacity, such as chemosensory genes, do not show elevated turnover but instead diversify through protein-sequence changes. This dynamism of anopheline genes and genomes may contribute to their flexible capacity to take advantage of new ecological niches, including adapting to humans as primary hosts. Copyright © 2015, American Association for the Advancement of Science.
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

PubMed Central

2017-01-01

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.

PubMed

Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco

2017-05-01

Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
A novel NOTCH3 mutation identified in patients with oral cancer by whole exome sequencing.

PubMed

Yi, Yanjun; Tian, Zhuowei; Ju, Houyu; Ren, Guoxin; Hu, Jingzhou

2017-06-01

Oral cancer is a serious disease caused by environmental factors and/or susceptible genes. In the present study, in order to identify useful genetic biomarkers for cancer prediction and prevention, and for personalized treatment, we detected somatic mutations in 5 pairs of oral cancer tissues and blood samples using whole exome sequencing (WES). Finally, we confirmed a novel nonsense single-nucleotide polymorphism (SNP; chr19:15288426A>C) in the NOTCH3 gene with sanger sequencing, which resulted in a N1438T mutation in the protein sequence. Using multiple in silico analyses, this variant was found to mildly damaging effects on the NOTCH3 gene, which was supported by the results from analyses using PANTHER, SNAP and SNPs&GO. However, further analysis using Mutation Taster revealed that this SNP had a probability of 0.9997 to be 'disease causing'. In addition, we performed 3D structure simulation analysis and the results suggested that this variant had little effect on the solubility and hydrophobicity of the protein and thus on its function; however, it decreased the stability of the protein by increasing the total energy following minimization (-1,051.39 kcal/mol for the mutant and -1,229.84 kcal/mol for the native) and decreasing one stabilizing residue of the protein. Less stability of the N1438T mutant was also supported by analysis using I-Mutant with a DDG value of -1.67. Overall, the present study identified and confirmed a novel mutation in the NOTCH3 gene, which may decrease the stability of NOTCH3, and may thus prove to be helpful in cancer prognosis.
Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

PubMed

Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

2015-09-01

Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.
Heterogeneous Rates of Molecular Evolution and Diversification Could Explain the Triassic Age Estimate for Angiosperms.

PubMed

Beaulieu, Jeremy M; O'Meara, Brian C; Crane, Peter; Donoghue, Michael J

2015-09-01

Dating analyses based on molecular data imply that crown angiosperms existed in the Triassic, long before their undisputed appearance in the fossil record in the Early Cretaceous. Following a re-analysis of the age of angiosperms using updated sequences and fossil calibrations, we use a series of simulations to explore the possibility that the older age estimates are a consequence of (i) major shifts in the rate of sequence evolution near the base of the angiosperms and/or (ii) the representative taxon sampling strategy employed in such studies. We show that both of these factors do tend to yield substantially older age estimates. These analyses do not prove that younger age estimates based on the fossil record are correct, but they do suggest caution in accepting the older age estimates obtained using current relaxed-clock methods. Although we have focused here on the angiosperms, we suspect that these results will shed light on dating discrepancies in other major clades. ©The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Barium Stars: Theoretical Interpretation

NASA Astrophysics Data System (ADS)

Husti, Laura; Gallino, Roberto; Bisterzo, Sara; Straniero, Oscar; Cristallo, Sergio

2009-09-01

Barium stars are extrinsic Asymptotic Giant Branch (AGB) stars. They present the s-enhancement characteristic for AGB and post-AGB stars, but are in an earlier evolutionary stage (main sequence dwarfs, subgiants, red giants). They are believed to form in binary systems, where a more massive companion evolved faster, produced the s-elements during its AGB phase, polluted the present barium star through stellar winds and became a white dwarf. The samples of barium stars of Allen & Barbuy (2006) and of Smiljanic et al. (2007) are analysed here. Spectra of both samples were obtained at high-resolution and high S/N. We compare these observations with AGB nucleosynthesis models using different initial masses and a spread of 13C-pocket efficiencies. Once a consistent solution is found for the whole elemental distribution of abundances, a proper dilution factor is applied. This dilution is explained by the fact that the s-rich material transferred from the AGB to the nowadays observed stars is mixed with the envelope of the accretor. We also analyse the mass transfer process, and obtain the wind velocity for giants and subgiants with known orbital period. We find evidence that thermohaline mixing is acting inside main sequence dwarfs and we present a method for estimating its depth.
WHOLE-GENOME SEQUENCING OF SALIVARY GLAND ADENOID CYSTIC CARCINOMA

PubMed Central

Rettig, Eleni M; Talbot, C Conover; Sausen, Mark; Jones, Sian; Bishop, Justin A; Wood, Laura D; Tokheim, Collin; Niknafs, Noushin; Karchin, Rachel; Fertig, Elana J; Wheelan, Sarah J; Marchionni, Luigi; Considine, Michael; Ling, Shizhang; Fakhry, Carole; Papadopoulos, Nickolas; Kinzler, Kenneth W; Vogelstein, Bert; Ha, Patrick K; Agrawal, Nishant

2016-01-01

Adenoid cystic carcinomas (ACCs) of the salivary glands are challenging to understand, treat, and cure. To better understand the genetic alterations underlying the pathogenesis of these tumors, we performed comprehensive genome analyses of 25 fresh-frozen tumors, including whole genome sequencing, expression and pathway analyses. In addition to the well-described MYB-NFIB fusion which was found in 11 tumors (44%), we observed five different rearrangements involving the NFIB transcription factor gene in seven tumors (28%). Taken together, NFIB translocations occurred in 15 of 25 samples (60%, 95%CI=41–77%). In addition, mRNA expression analysis of 17 tumors revealed overexpression of NFIB in ACC tumors compared with normal tissues (p=0.002). There was no difference in NFIB mRNA expression in tumors with NFIB fusions compared to those without. We also report somatic mutations of genes involved in the axonal guidance and Rho family signaling pathways. Finally, we confirm previously described alterations in genes related to chromatin regulation and Notch signaling. Our findings suggest a separate role for NFIB in ACC oncogenesis and highlight important signaling pathways for future functional characterization and potential therapeutic targeting. PMID:26862087
Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis.

PubMed

Ma, Chuang; Wang, Xiangfeng

2012-09-01

One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.
Application of the Gini Correlation Coefficient to Infer Regulatory Relationships in Transcriptome Analysis[W][OA

PubMed Central

Ma, Chuang; Wang, Xiangfeng

2012-01-01

One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey’s biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses. PMID:22797655
Molecular systematics of Indian Alysicarpus (Fabaceae) based on analyses of nuclear ribosomal DNA sequences.

PubMed

Gholami, Akram; Subramaniam, Shweta; Geeta, R; Pandey, Arun K

2017-06-01

Alysicarpus Necker ex Desvaux (Fabaceae, Desmodieae) consists of ~30 species that are distributed in tropical and subtropical regions of theworld. In India, the genus is represented by ca. 18 species, ofwhich seven are endemic. Sequences of the nuclear Internal transcribed spacer from38 accessions representing 16 Indian specieswere subjected to phylogenetic analyses. The ITS sequence data strongly support the monophyly of the genus Alysicarpus. Analyses revealed four major well-supported clades within Alysicarpus. Ancestral state reconstructions were done for two morphological characters, namely calyx length in relation to pod (macrocalyx and microcalyx) and pod surface ornamentation (transversely rugose and nonrugose). The present study is the first report on molecular systematics of Indian Alysicarpus.
Genomewide analysis of TCP transcription factor gene family in Malus domestica.

PubMed

Xu, Ruirui; Sun, Peng; Jia, Fengjuan; Lu, Longtao; Li, Yuanyuan; Zhang, Shizhong; Huang, Jinguang

2014-12-01

Teosinte branched 1/cycloidea/proliferating cell factor 1 (TCP) proteins are a large family of transcriptional regulators in angiosperms. They are involved in various biological processes, including development and plant metabolism pathways. In this study, a total of 52 TCP genes were identified in apple (Malus domestica) genome. Bioinformatic methods were employed to predicate and analyse their relevant gene classification, gene structure, chromosome location, sequence alignment and conserved domains of MdTCP proteins. Expression analysis from microarray data showed that the expression levels of 28 and 51 MdTCP genes changed during the ripening and rootstock-scion interaction processes, respectively. The expression patterns of 12 selected MdTCP genes were analysed in different tissues and in response to abiotic stresses. All of the selected genes were detected in at least one of the tissues tested, and most of them were modulated by adverse treatments indicating that the MdTCPs were involved in various developmental and physiological processes. To the best of our knowledge, this is the first study of a genomewide analysis of apple TCP gene family. These results provide valuable information for studies on functions of the TCP transcription factor genes in apple.
Dehydration-induced WRKY genes from tobacco and soybean respond to jasmonic acid treatments in BY-2 cell culture.

PubMed

Rabara, Roel C; Tripathi, Prateek; Lin, Jun; Rushton, Paul J

2013-02-15

Drought is one of the important environmental factors affecting crop production worldwide and therefore understanding the molecular response of plant to stress is an important step in crop improvement. WRKY transcription factors are one of the 10 largest transcription factor families across the green lineage. In this study, highly upregulated dehydration-induced WRKY and enzyme-coding genes from tobacco and soybean were selected from microarray data for promoter analyses. Putative stress-related cis-regulatory elements such as TGACG motif, ABRE-like elements; W and G-like sequences were identified by an in silico analyses of promoter region of the selected genes. GFP quantification of transgenic BY-2 cell culture showed these promoters direct higher expression in-response to 100 μM JA treatment compared to 100 μM ABA, 10% PEG and 85 mM NaCl treatments. Thus promoter activity upon JA treatment and enrichment of MeJA-responsive elements in the promoter of the selected genes provides insights for these genes to be jasmonic acid responsive with potential of mediating cross-talk during dehydration responses. Copyright © 2013 Elsevier Inc. All rights reserved.
Transmission Bottleneck Size Estimation from Pathogen Deep-Sequencing Data, with an Application to Human Influenza A Virus.

PubMed

Sobel Leonard, Ashley; Weissman, Daniel B; Greenbaum, Benjamin; Ghedin, Elodie; Koelle, Katia

2017-07-15

The bottleneck governing infectious disease transmission describes the size of the pathogen population transferred from the donor to the recipient host. Accurate quantification of the bottleneck size is particularly important for rapidly evolving pathogens such as influenza virus, as narrow bottlenecks reduce the amount of transferred viral genetic diversity and, thus, may decrease the rate of viral adaptation. Previous studies have estimated bottleneck sizes governing viral transmission by using statistical analyses of variants identified in pathogen sequencing data. These analyses, however, did not account for variant calling thresholds and stochastic viral replication dynamics within recipient hosts. Because these factors can skew bottleneck size estimates, we introduce a new method for inferring bottleneck sizes that accounts for these factors. Through the use of a simulated data set, we first show that our method, based on beta-binomial sampling, accurately recovers transmission bottleneck sizes, whereas other methods fail to do so. We then apply our method to a data set of influenza A virus (IAV) infections for which viral deep-sequencing data from transmission pairs are available. We find that the IAV transmission bottleneck size estimates in this study are highly variable across transmission pairs, while the mean bottleneck size of 196 virions is consistent with a previous estimate for this data set. Furthermore, regression analysis shows a positive association between estimated bottleneck size and donor infection severity, as measured by temperature. These results support findings from experimental transmission studies showing that bottleneck sizes across transmission events can be variable and influenced in part by epidemiological factors. IMPORTANCE The transmission bottleneck size describes the size of the pathogen population transferred from the donor to the recipient host and may affect the rate of pathogen adaptation within host populations. Recent advances in sequencing technology have enabled bottleneck size estimation from pathogen genetic data, although there is not yet a consistency in the statistical methods used. Here, we introduce a new approach to infer the bottleneck size that accounts for variant identification protocols and noise during pathogen replication. We show that failing to account for these factors leads to an underestimation of bottleneck sizes. We apply this method to an existing data set of human influenza virus infections, showing that transmission is governed by a loose, but highly variable, transmission bottleneck whose size is positively associated with the severity of infection of the donor. Beyond advancing our understanding of influenza virus transmission, we hope that this work will provide a standardized statistical approach for bottleneck size estimation for viral pathogens. Copyright © 2017 Sobel Leonard et al.
Human Lineage-Specific Transcriptional Regulation through GA-Binding Protein Transcription Factor Alpha (GABPa)

PubMed Central

Perdomo-Sabogal, Alvaro; Nowick, Katja; Piccini, Ilaria; Sudbrak, Ralf; Lehrach, Hans; Yaspo, Marie-Laure; Warnatz, Hans-Jörg; Querfurth, Robert

2016-01-01

A substantial fraction of phenotypic differences between closely related species are likely caused by differences in gene regulation. While this has already been postulated over 30 years ago, only few examples of evolutionary changes in gene regulation have been verified. Here, we identified and investigated binding sites of the transcription factor GA-binding protein alpha (GABPa) aiming to discover cis-regulatory adaptations on the human lineage. By performing chromatin immunoprecipitation-sequencing experiments in a human cell line, we found 11,619 putative GABPa binding sites. Through sequence comparisons of the human GABPa binding regions with orthologous sequences from 34 mammals, we identified substitutions that have resulted in 224 putative human-specific GABPa binding sites. To experimentally assess the transcriptional impact of those substitutions, we selected four promoters for promoter-reporter gene assays using human and African green monkey cells. We compared the activities of wild-type promoters to mutated forms, where we have introduced one or more substitutions to mimic the ancestral state devoid of the GABPa consensus binding sequence. Similarly, we introduced the human-specific substitutions into chimpanzee and macaque promoter backgrounds. Our results demonstrate that the identified substitutions are functional, both in human and nonhuman promoters. In addition, we performed GABPa knock-down experiments and found 1,215 genes as strong candidates for primary targets. Further analyses of our data sets link GABPa to cognitive disorders, diabetes, KRAB zinc finger (KRAB-ZNF), and human-specific genes. Thus, we propose that differences in GABPa binding sites played important roles in the evolution of human-specific phenotypes. PMID:26814189
Phylogeny and origin of 82 zygomycetes from all 54 genera of the Mucorales and Mortierellales based on combined analysis of actin and translation elongation factor EF-1alpha genes.

PubMed

Voigt, K; Wöstemeyer, J

2001-05-30

True fungi (Eumycota) are heterotrophic eukaryotic microorganisms encompassing ascomycetes, basidiomycetes, chytridiomycetes and zygomycetes. The natural systematics of the latter group, Zygomycota, are very poorly understood due to the lack of distinguishing morphological characters. We have determined sequences for the nuclear-encoded genes actin (act) from 82 zygomycetes representing all 54 currently recognized genera from the two zygomycetous orders Mucorales and Mortierellales. We also determined sequences for translation elongation factor EF-1alpha (tef) from 16 zygomycetes (total of 96,837 bp). Phylogenetic analysis in the context of available sequence data (total 2,062 nucleotide positions per species) revealed that current classification schemes for the mucoralean fungi are highly unnatural at the family and, to a large extent, at the genus level. The data clearly indicate a deep, ancient and distinct dichotomy of the orders Mucorales and Mortierellales, which are recognized only in some zygomycete systems. Yet at the same time the data show that two genera - Umbelopsis and Micromucor - previously placed within the Mortierellales on the basis of their weakly developed columella (a morphological structure of the sporangiophore well-developed within all Mucorales) are in fact members of the Mucorales. Phylogenetic analyses of the encoded amino acid sequences in the context of homologues from eukaryotes and archaebacterial outgroups indicate that the Eumycota studied here are a natural group but provide little or no support for the monophyly of either zygomycetes, ascomycetes or basidiomycetes. The data clearly indicate that a complete revision of zygomycete natural systematics is necessary.
Endodontic Microbiology and Pathobiology: Current State of Knowledge.

PubMed

Fouad, Ashraf F

2017-01-01

Newer research tools and basic science knowledge base have allowed the exploration of endodontic diseases in the pulp and periapical tissues in novel ways. The use of next generation sequencing, bioinformatics analyses, genome-wide association studies, to name just a few of these innovations, has allowed the identification of hundreds of microorganisms and of host response factors. This review addresses recent advances in endodontic microbiology and the host response and discusses the potential for future innovations in this area. Copyright Â© 2016 Elsevier Inc. All rights reserved.

Gene sequence analyses and other DNA-based methods for yeast species recognition

USDA-ARS?s Scientific Manuscript database

DNA sequence analyses, as well as other DNA-based methodologies, have transformed the way in which yeasts are identified. The focus of this chapter will be on the resolution of species using various types of DNA comparisons. In other chapters in this book, Rozpedowska, Piškur and Wolfe discuss mul...
HIV-1 Full-Genome Phylogenetics of Generalized Epidemics in Sub-Saharan Africa: Impact of Missing Nucleotide Characters in Next-Generation Sequences

PubMed Central

Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, Max; Frost, Simon; Gall, Astrid; Gaseitsiwe, Simani; Grabowski, Mary K.; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vlad; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C.; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Brown, Andy Leigh; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-01-01

Abstract To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected. PMID:28540766
HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences.

PubMed

Ratmann, Oliver; Wymant, Chris; Colijn, Caroline; Danaviah, Siva; Essex, M; Frost, Simon D W; Gall, Astrid; Gaiseitsiwe, Simani; Grabowski, Mary; Gray, Ronald; Guindon, Stephane; von Haeseler, Arndt; Kaleebu, Pontiano; Kendall, Michelle; Kozlov, Alexey; Manasa, Justen; Minh, Bui Quang; Moyo, Sikhulile; Novitsky, Vladimir; Nsubuga, Rebecca; Pillay, Sureshnee; Quinn, Thomas C; Serwadda, David; Ssemwanga, Deogratius; Stamatakis, Alexandros; Trifinopoulos, Jana; Wawer, Maria; Leigh Brown, Andrew; de Oliveira, Tulio; Kellam, Paul; Pillay, Deenan; Fraser, Christophe

2017-05-25

To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the 'Phylogenetics and Networks for Generalised HIV Epidemics in Africa' consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n=2,833; MRC/UVRI Uganda, n=701; Mochudi Prevention Project, n=359; Africa Health Research Institute Resistance Cohort, n=92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3' end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phylogeny reconstruction is expected.
Genome-Wide Analysis Reveals Novel Regulators of Growth in Drosophila melanogaster

PubMed Central

Vonesch, Sibylle Chantal; Lamparter, David; Mackay, Trudy F. C.; Bergmann, Sven; Hafen, Ernst

2016-01-01

Organismal size depends on the interplay between genetic and environmental factors. Genome-wide association (GWA) analyses in humans have implied many genes in the control of height but suffer from the inability to control the environment. Genetic analyses in Drosophila have identified conserved signaling pathways controlling size; however, how these pathways control phenotypic diversity is unclear. We performed GWA of size traits using the Drosophila Genetic Reference Panel of inbred, sequenced lines. We find that the top associated variants differ between traits and sexes; do not map to canonical growth pathway genes, but can be linked to these by epistasis analysis; and are enriched for genes and putative enhancers. Performing GWA on well-studied developmental traits under controlled conditions expands our understanding of developmental processes underlying phenotypic diversity. PMID:26751788
PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

PubMed

Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

2017-11-01

High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.
Molecular population dynamics of DNA structures in a bcl-2 promoter sequence is regulated by small molecules and the transcription factor hnRNP LL

PubMed Central

Cui, Yunxi; Koirala, Deepak; Kang, HyunJin; Dhakal, Soma; Yangyuoru, Philip; Hurley, Laurence H.; Mao, Hanbin

2014-01-01

Minute difference in free energy change of unfolding among structures in an oligonucleotide sequence can lead to a complex population equilibrium, which is rather challenging for ensemble techniques to decipher. Herein, we introduce a new method, molecular population dynamics (MPD), to describe the intricate equilibrium among non-B deoxyribonucleic acid (DNA) structures. Using mechanical unfolding in laser tweezers, we identified six DNA species in a cytosine (C)-rich bcl-2 promoter sequence. Population patterns of these species with and without a small molecule (IMC-76 or IMC-48) or the transcription factor hnRNP LL are compared to reveal the MPD of different species. With a pattern recognition algorithm, we found that IMC-48 and hnRNP LL share 80% similarity in stabilizing i-motifs with 60 s incubation. In contrast, IMC-76 demonstrates an opposite behavior, preferring flexible DNA hairpins. With 120–180 s incubation, IMC-48 and hnRNP LL destabilize i-motifs, which has been previously proposed to activate bcl-2 transcriptions. These results provide strong support, from the population equilibrium perspective, that small molecules and hnRNP LL can modulate bcl-2 transcription through interaction with i-motifs. The excellent agreement with biochemical results firmly validates the MPD analyses, which, we expect, can be widely applicable to investigate complex equilibrium of biomacromolecules. PMID:24609386
New insights into the genetic diversity of Leishmania RNA Virus 1 and its species-specific relationship with Leishmania parasites.

PubMed

Cantanhêde, Lilian Motta; Fernandes, Flavia Gonçalves; Ferreira, Gabriel Eduardo Melim; Porrozzi, Renato; Ferreira, Ricardo de Godoi Mattos; Cupolillo, Elisa

2018-01-01

Cutaneous leishmaniasis is a neglected parasitic disease that manifests in infected individuals under different phenotypes, with a range of factors contributing to its broad clinical spectrum. One factor, Leishmania RNA Virus 1 (LRV1), has been described as an endosymbiont present in different species of Leishmania. LRV1 significantly worsens the lesion, exacerbating the immune response in both experimentally infected animals and infected individuals. Little is known about the composition and genetic diversity of these viruses. Here, we investigated the relationship between the genetic composition of LRV1 detected in strains of Leishmania (Viannia) braziliensis and L. (V.) guyanensis and the interaction between the endosymbiont and the parasitic species, analyzing an approximately 850 base pair region of the viral genome. We also included one LRV1 sequence detected in L. (V.) shawi, representing the first report of LRV1 in a species other than L. braziliensis and L. guyanensis. The results illustrate the genetic diversity of the LRV1 strains analyzed here, with smaller divergences detected among viral sequences from the same parasite species. Phylogenetic analyses showed that the LRV1 sequences are grouped according to the parasite species and possibly according to the population of the parasite in which the virus was detected, corroborating the hypothesis of joint evolution of the viruses with the speciation of Leishmania parasites.
Phylogeny and differentiation of reptilian and amphibian ranaviruses detected in Europe.

PubMed

Stöhr, Anke C; López-Bueno, Alberto; Blahak, Silvia; Caeiro, Maria F; Rosa, Gonçalo M; Alves de Matos, António Pedro; Martel, An; Alejo, Alí; Marschang, Rachel E

2015-01-01

Ranaviruses in amphibians and fish are considered emerging pathogens and several isolates have been extensively characterized in different studies. Ranaviruses have also been detected in reptiles with increasing frequency, but the role of reptilian hosts is still unclear and only limited sequence data has been provided. In this study, we characterized a number of ranaviruses detected in wild and captive animals in Europe based on sequence data from six genomic regions (major capsid protein (MCP), DNA polymerase (DNApol), ribonucleoside diphosphate reductase alpha and beta subunit-like proteins (RNR-α and -β), viral homolog of the alpha subunit of eukaryotic initiation factor 2, eIF-2α (vIF-2α) genes and microsatellite region). A total of ten different isolates from reptiles (tortoises, lizards, and a snake) and four ranaviruses from amphibians (anurans, urodeles) were included in the study. Furthermore, the complete genome sequences of three reptilian isolates were determined and a new PCR for rapid classification of the different variants of the genomic arrangement was developed. All ranaviruses showed slight variations on the partial nucleotide sequences from the different genomic regions (92.6-100%). Some very similar isolates could be distinguished by the size of the band from the microsatellite region. Three of the lizard isolates had a truncated vIF-2α gene; the other ranaviruses had full-length genes. In the phylogenetic analyses of concatenated sequences from different genes (3223 nt/10287 aa), the reptilian ranaviruses were often more closely related to amphibian ranaviruses than to each other, and most clustered together with previously detected ranaviruses from the same geographic region of origin. Comparative analyses show that among the closely related amphibian-like ranaviruses (ALRVs) described to date, three recently split and independently evolving distinct genetic groups can be distinguished. These findings underline the wide host range of ranaviruses and the emergence of pathogen pollution via animal trade of ectothermic vertebrates.
Phylogeny and Differentiation of Reptilian and Amphibian Ranaviruses Detected in Europe

PubMed Central

Stöhr, Anke C.; López-Bueno, Alberto; Blahak, Silvia; Caeiro, Maria F.; Rosa, Gonçalo M.; Alves de Matos, António Pedro; Martel, An; Alejo, Alí; Marschang, Rachel E.

2015-01-01

Ranaviruses in amphibians and fish are considered emerging pathogens and several isolates have been extensively characterized in different studies. Ranaviruses have also been detected in reptiles with increasing frequency, but the role of reptilian hosts is still unclear and only limited sequence data has been provided. In this study, we characterized a number of ranaviruses detected in wild and captive animals in Europe based on sequence data from six genomic regions (major capsid protein (MCP), DNA polymerase (DNApol), ribonucleoside diphosphate reductase alpha and beta subunit-like proteins (RNR-α and -β), viral homolog of the alpha subunit of eukaryotic initiation factor 2, eIF-2α (vIF-2α) genes and microsatellite region). A total of ten different isolates from reptiles (tortoises, lizards, and a snake) and four ranaviruses from amphibians (anurans, urodeles) were included in the study. Furthermore, the complete genome sequences of three reptilian isolates were determined and a new PCR for rapid classification of the different variants of the genomic arrangement was developed. All ranaviruses showed slight variations on the partial nucleotide sequences from the different genomic regions (92.6–100%). Some very similar isolates could be distinguished by the size of the band from the microsatellite region. Three of the lizard isolates had a truncated vIF-2α gene; the other ranaviruses had full-length genes. In the phylogenetic analyses of concatenated sequences from different genes (3223 nt/10287 aa), the reptilian ranaviruses were often more closely related to amphibian ranaviruses than to each other, and most clustered together with previously detected ranaviruses from the same geographic region of origin. Comparative analyses show that among the closely related amphibian-like ranaviruses (ALRVs) described to date, three recently split and independently evolving distinct genetic groups can be distinguished. These findings underline the wide host range of ranaviruses and the emergence of pathogen pollution via animal trade of ectothermic vertebrates. PMID:25706285
Translating genomics into practice for real-time surveillance and response to carbapenemase-producing Enterobacteriaceae: evidence from a complex multi-institutional KPC outbreak.

PubMed

Kwong, Jason C; Lane, Courtney R; Romanes, Finn; Gonçalves da Silva, Anders; Easton, Marion; Cronin, Katie; Waters, Mary Jo; Tomita, Takehiro; Stevens, Kerrie; Schultz, Mark B; Baines, Sarah L; Sherry, Norelle L; Carter, Glen P; Mu, Andre; Sait, Michelle; Ballard, Susan A; Seemann, Torsten; Stinear, Timothy P; Howden, Benjamin P

2018-01-01

Until recently, Klebsiella pneumoniae carbapenemase (KPC)-producing Enterobacteriaceae were rarely identified in Australia. Following an increase in the number of incident cases across the state of Victoria, we undertook a real-time combined genomic and epidemiological investigation. The scope of this study included identifying risk factors and routes of transmission, and investigating the utility of genomics to enhance traditional field epidemiology for informing management of established widespread outbreaks. All KPC-producing Enterobacteriaceae isolates referred to the state reference laboratory from 2012 onwards were included. Whole-genome sequencing was performed in parallel with a detailed descriptive epidemiological investigation of each case, using Illumina sequencing on each isolate. This was complemented with PacBio long-read sequencing on selected isolates to establish high-quality reference sequences and interrogate characteristics of KPC-encoding plasmids. Initial investigations indicated that the outbreak was widespread, with 86 KPC-producing Enterobacteriaceae isolates ( K. pneumoniae 92%) identified from 35 different locations across metropolitan and rural Victoria between 2012 and 2015. Initial combined analyses of the epidemiological and genomic data resolved the outbreak into distinct nosocomial transmission networks, and identified healthcare facilities at the epicentre of KPC transmission. New cases were assigned to transmission networks in real-time, allowing focussed infection control efforts. PacBio sequencing confirmed a secondary transmission network arising from inter-species plasmid transmission. Insights from Bayesian transmission inference and analyses of within-host diversity informed the development of state-wide public health and infection control guidelines, including interventions such as an intensive approach to screening contacts following new case detection to minimise unrecognised colonisation. A real-time combined epidemiological and genomic investigation proved critical to identifying and defining multiple transmission networks of KPC Enterobacteriaceae, while data from either investigation alone were inconclusive. The investigation was fundamental to informing infection control measures in real-time and the development of state-wide public health guidelines on carbapenemase-producing Enterobacteriaceae surveillance and management.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

PubMed Central

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-01-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

PubMed

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-09-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Software for optimization of SNP and PCR-RFLP genotyping to discriminate many genomes with the fewest assays

PubMed Central

Gardner, Shea N; Wagner, Mark C

2005-01-01

Background Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. Results A method and SPR Opt (SNP and PCR-RFLP Optimization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. Conclusion This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at . PMID:15904493
Systematics of Plant-Pathogenic and Related Streptomyces Species Based on Phylogenetic Analyses of Multiple Gene Loci

USDA-ARS?s Scientific Manuscript database

The 10 species of Streptomyces implicated as the etiological agents in scab disease of potatoes or soft rot disease of sweet potatoes are distributed among 7 different phylogenetic clades in analyses based on 16S rRNA gene sequences, but high sequence similarity of this gene among Streptomyces speci...
Phylogenetic and Functional Analysis of Metagenome Sequence from High-Temperature Archaeal Habitats Demonstrate Linkages between Metabolic Potential and Geochemistry

PubMed Central

Inskeep, William P.; Jay, Zackary J.; Herrgard, Markus J.; Kozubal, Mark A.; Rusch, Douglas B.; Tringe, Susannah G.; Macur, Richard E.; Jennings, Ryan deM.; Boyd, Eric S.; Spear, John R.; Roberto, Francisco F.

2013-01-01

Geothermal habitats in Yellowstone National Park (YNP) provide an unparalleled opportunity to understand the environmental factors that control the distribution of archaea in thermal habitats. Here we describe, analyze, and synthesize metagenomic and geochemical data collected from seven high-temperature sites that contain microbial communities dominated by archaea relative to bacteria. The specific objectives of the study were to use metagenome sequencing to determine the structure and functional capacity of thermophilic archaeal-dominated microbial communities across a pH range from 2.5 to 6.4 and to discuss specific examples where the metabolic potential correlated with measured environmental parameters and geochemical processes occurring in situ. Random shotgun metagenome sequence (∼40–45 Mb Sanger sequencing per site) was obtained from environmental DNA extracted from high-temperature sediments and/or microbial mats and subjected to numerous phylogenetic and functional analyses. Analysis of individual sequences (e.g., MEGAN and G + C content) and assemblies from each habitat type revealed the presence of dominant archaeal populations in all environments, 10 of whose genomes were largely reconstructed from the sequence data. Analysis of protein family occurrence, particularly of those involved in energy conservation, electron transport, and autotrophic metabolism, revealed significant differences in metabolic strategies across sites consistent with differences in major geochemical attributes (e.g., sulfide, oxygen, pH). These observations provide an ecological basis for understanding the distribution of indigenous archaeal lineages across high-temperature systems of YNP. PMID:23720654
RNA sequencing on Amomum villosum Lour. induced by MeJA identifies the genes of WRKY and terpene synthases involved in terpene biosynthesis.

PubMed

He, Xueying; Wang, Huan; Yang, Jinfen; Deng, Ke; Wang, Teng

2018-02-01

Amomum villosum Lour. is an important Chinese medicinal plant that has diverse medicinal functions, and mainly contains volatile terpenes. This study aims to explore the WRKY transcription factors (TFs) and terpene synthase (TPS) unigenes that might be involved in terpene biosynthesis in A. villosum, and thus providing some new information on the regulation of terpenes in plants. RNA sequencing of A. villosum induced by methyl jasmonate (MeJA) revealed that the WRKY family was the second largest TF family in the transcriptome. Thirty-six complete WRKY domain sequences were expressed in response to MeJA. Further, six WRKY unigenes were highly correlated with eight deduced TPS unigenes. Ultimately, we combined the terpene abundance with the expression of candidate WRKY TFs and TPS unigenes to presume a possible model wherein AvWRKY61, AvWRKY28, and AvWRKY40 might coordinately trans-activate the AvNeoD promoter. We propose an approach to further investigate TF unigenes that might be involved in terpenoid biosynthesis, and identified four unigenes for further analyses.
Brettanomyces acidodurans sp. nov., a new acetic acid producing yeast species from olive oil.

PubMed

Péter, Gábor; Dlauchy, Dénes; Tóbiás, Andrea; Fülöp, László; Podgoršek, Martina; Čadež, Neža

2017-05-01

Two yeast strains representing a hitherto undescribed yeast species were isolated from olive oil and spoiled olive oil originating from Spain and Israel, respectively. Both strains are strong acetic acid producers, equipped with considerable tolerance to acetic acid. The cultures are not short-lived. Cellobiose is fermented as well as several other sugars. The sequences of their large subunit (LSU) rRNA gene D1/D2 domain are very divergent from the sequences available in the GenBank. They differ from the closest hit, Brettanomyces naardenensis by about 27%, mainly substitutions. Sequence analyses of the concatenated dataset from genes of the small subunit (SSU) rRNA, LSU rRNA and translation elongation factor-1α (EF-1α) placed the two strains as an early diverging member of the Brettanomyces/Dekkera clade with high bootstrap support. Sexual reproduction was not observed. The name Brettanomyces acidodurans sp. nov. (holotype: NCAIM Y.02178 T ; isotypes: CBS 14519 T = NRRL Y-63865 T = ZIM 2626 T , MycoBank no.: MB 819608) is proposed for this highly divergent new yeast species.
Whole-genome phylogenies of the family Bacillaceae and expansion of the sigma factor gene family in the Bacillus cereus species-group

PubMed Central

2011-01-01

Background The Bacillus cereus sensu lato group consists of six species (B. anthracis, B. cereus, B. mycoides, B. pseudomycoides, B. thuringiensis, and B. weihenstephanensis). While classical microbial taxonomy proposed these organisms as distinct species, newer molecular phylogenies and comparative genome sequencing suggests that these organisms should be classified as a single species (thus, we will refer to these organisms collectively as the Bc species-group). How do we account for the underlying similarity of these phenotypically diverse microbes? It has been established for some time that the most rapidly evolving and evolutionarily flexible portions of the bacterial genome are regulatory sequences and transcriptional networks. Other studies have suggested that the sigma factor gene family of these organisms has diverged and expanded significantly relative to their ancestors; sigma factors are those portions of the bacterial transcriptional apparatus that control RNA polymerase recognition for promoter selection. Thus, examining sigma factor divergence in these organisms would concurrently examine both regulatory sequences and transcriptional networks important for divergence. We began this examination by comparison to the sigma factor gene set of B. subtilis. Results Phylogenetic analysis of the Bc species-group utilizing 157 single-copy genes of the family Bacillaceae suggests that several taxonomic revisions of the genus Bacillus should be considered. Within the Bc species-group there is little indication that the currently recognized species form related sub-groupings, suggesting that they are members of the same species. The sigma factor gene family encoded by the Bc species-group appears to be the result of a dynamic gene-duplication and gene-loss process that in previous analyses underestimated the true heterogeneity of the sigma factor content in the Bc species-group. Conclusions Expansion of the sigma factor gene family appears to have preferentially occurred within the extracytoplasmic function (ECF) sigma factor genes, while the primary alternative (PA) sigma factor genes are, in general, highly conserved with those found in B. subtilis. Divergence of the sigma-controlled transcriptional regulons among various members of the Bc species-group likely has a major role in explaining the diversity of phenotypic characteristics seen in members of the Bc species-group. PMID:21864360
Exome Sequencing in Suspected Monogenic Dyslipidemias

PubMed Central

Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar

2015-01-01

Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026
PathogenFinder--distinguishing friend from foe using bacterial whole genome sequence data.

PubMed

Cosentino, Salvatore; Voldby Larsen, Mette; Møller Aarestrup, Frank; Lund, Ole

2013-01-01

Although the majority of bacteria are harmless or even beneficial to their host, others are highly virulent and can cause serious diseases, and even death. Due to the constantly decreasing cost of high-throughput sequencing there are now many completely sequenced genomes available from both human pathogenic and innocuous strains. The data can be used to identify gene families that correlate with pathogenicity and to develop tools to predict the pathogenicity of newly sequenced strains, investigations that previously were mainly done by means of more expensive and time consuming experimental approaches. We describe PathogenFinder (http://cge.cbs.dtu.dk/services/PathogenFinder/), a web-server for the prediction of bacterial pathogenicity by analysing the input proteome, genome, or raw reads provided by the user. The method relies on groups of proteins, created without regard to their annotated function or known involvement in pathogenicity. The method has been built to work with all taxonomic groups of bacteria and using the entire training-set, achieved an accuracy of 88.6% on an independent test-set, by correctly classifying 398 out of 449 completely sequenced bacteria. The approach here proposed is not biased on sets of genes known to be associated with pathogenicity, thus the approach could aid the discovery of novel pathogenicity factors. Furthermore the pathogenicity prediction web-server could be used to isolate the potential pathogenic features of both known and unknown strains.

Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

PubMed Central

Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

2014-01-01

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775
Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

PubMed Central

2011-01-01

Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684
Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol.

PubMed

Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J

2014-02-06

Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
The First Complete Mitochondrial Genome Sequences for Stomatopod Crustaceans: Implications for Phylogeny

DOE Office of Scientific and Technical Information (OSTI.GOV)

Swinstrom, Kirsten; Caldwell, Roy; Fourcade, H. Matthew

2005-09-07

We report the first complete mitochondrial genome sequences of stomatopods and compare their features to each other and to those of other crustaceans. Phylogenetic analyses of the concatenated mitochondrial protein-coding sequences were used to explore relationships within the Stomatopoda, within the malacostracan crustaceans, and among crustaceans and insects. Although these analyses support the monophyly of both Malacostraca and, within it, Stomatopoda, it also confirms the view of a paraphyletic Crustacea, with Malacostraca being more closely related to insects than to the branchiopod crustaceans.
Kinetic, Thermodynamic, and Structural Characterizations of the Association between Nrf2-DLGex Degron and Keap1

PubMed Central

Fukutomi, Toshiaki; Takagi, Kenji; Mizushima, Tsunehiro; Ohuchi, Noriaki

2014-01-01

Transcription factor Nrf2 (NF-E2-related factor 2) coordinately regulates cytoprotective gene expression, but under unstressed conditions, Nrf2 is degraded rapidly through Keap1 (Kelch-like ECH-associated protein 1)-mediated ubiquitination. Nrf2 harbors two Keap1-binding motifs, DLG and ETGE. Interactions between these two motifs and Keap1 constitute a key regulatory nexus for cellular Nrf2 activity through the formation of a two-site binding hinge-and-latch mechanism. In this study, we determined the minimum Keap1-binding sequence of the DLG motif, the low-affinity latch site, and defined a new DLGex motif that covers a sequence much longer than that previously defined. We have successfully clarified the crystal structure of the Keap1-DC-DLGex complex at 1.6 Å. DLGex possesses a complicated helix structure, which interprets well the human-cancer-derived loss-of-function mutations in DLGex. In thermodynamic analyses, Keap1-DLGex binding is characterized as enthalpy and entropy driven, while Keap1-ETGE binding is characterized as purely enthalpy driven. In kinetic analyses, Keap1-DLGex binding follows a fast-association and fast-dissociation model, while Keap1-ETGE binding contains a slow-reaction step that leads to a stable conformation. These results demonstrate that the mode of DLGex binding to Keap1 is distinct from that of ETGE structurally, thermodynamically, and kinetically and support our contention that the DLGex motif serves as a converter transmitting environmental stress to Nrf2 induction as the latch site. PMID:24366543
Genome-Wide Classification and Evolutionary and Expression Analyses of Citrus MYB Transcription Factor Families in Sweet Orange

PubMed Central

Hou, Xiao-Jin; Li, Si-Bei; Liu, Sheng-Rui; Hu, Chun-Gen; Zhang, Jin-Zhi

2014-01-01

MYB family genes are widely distributed in plants and comprise one of the largest transcription factors involved in various developmental processes and defense responses of plants. To date, few MYB genes and little expression profiling have been reported for citrus. Here, we describe and classify 177 members of the sweet orange MYB gene (CsMYB) family in terms of their genomic gene structures and similarity to their putative Arabidopsis orthologs. According to these analyses, these CsMYBs were categorized into four groups (4R-MYB, 3R-MYB, 2R-MYB and 1R-MYB). Gene structure analysis revealed that 1R-MYB genes possess relatively more introns as compared with 2R-MYB genes. Investigation of their chromosomal localizations revealed that these CsMYBs are distributed across nine chromosomes. Sweet orange includes a relatively small number of MYB genes compared with the 198 members in Arabidopsis, presumably due to a paralog reduction related to repetitive sequence insertion into promoter and non-coding transcribed region of the genes. Comparative studies of CsMYBs and Arabidopsis showed that CsMYBs had fewer gene duplication events. Expression analysis revealed that the MYB gene family has a wide expression profile in sweet orange development and plays important roles in development and stress responses. In addition, 337 new putative microsatellites with flanking sequences sufficient for primer design were also identified from the 177 CsMYBs. These results provide a useful reference for the selection of candidate MYB genes for cloning and further functional analysis forcitrus. PMID:25375352
The origins and impact of primate segmental duplications.

PubMed

Marques-Bonet, Tomas; Girirajan, Santhosh; Eichler, Evan E

2009-10-01

Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.
Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers

PubMed Central

Zoledziewska, Magdalena; Mulas, Antonella; Pistis, Giorgio; Steri, Maristella; Danjou, Fabrice; Kwong, Alan; Ortega del Vecchyo, Vicente Diego; Chiang, Charleston W. K.; Bragg-Gresham, Jennifer; Pitzalis, Maristella; Nagaraja, Ramaiah; Tarrier, Brendan; Brennan, Christine; Uzzau, Sergio; Fuchsberger, Christian; Atzeni, Rossano; Reinier, Frederic; Berutti, Riccardo; Huang, Jie; Timpson, Nicholas J; Toniolo, Daniela; Gasparini, Paolo; Malerba, Giovanni; Dedoussis, George; Zeggini, Eleftheria; Soranzo, Nicole; Jones, Chris; Lyons, Robert; Angius, Andrea; Kang, Hyun M.; Novembre, John; Sanna, Serena; Schlessinger, David; Cucca, Francesco; Abecasis, Gonçalo R

2015-01-01

We report ~17.6M genetic variants from whole-genome sequencing of 2,120 Sardinians; 22% are absent from prior sequencing-based compilations and enriched for predicted functional consequence. Furthermore, ~76K variants common in our sample (frequency >5%) are rare elsewhere (<0.5% in the 1000 Genomes Project). We assessed the impact of these variants on circulating lipid levels and five inflammatory biomarkers. Fourteen signals, including two major new loci, were observed for lipid levels, and 19, including two novel loci, for inflammatory markers. New associations would be missed in analyses based on 1000 Genomes data, underlining the advantages of large-scale sequencing in this founder population. PMID:26366554
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria.

PubMed

Gaby, John Christian; Buckley, Daniel H

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
Development of PCR primers specific for the amplification and direct sequencing of gyrB genes from microbacteria, order Actinomycetales.

PubMed

Richert, Kathrin; Brambilla, Evelyne; Stackebrandt, Erko

2005-01-01

PCR primer sets were developed for the specific amplification and sequence analyses encoding the gyrase subunit B (gyrB) of members of the family Microbacteriaceae, class Actinobacteria. The family contains species highly related by 16S rRNA gene sequence analyses. In order to test if the gene sequence analysis of gyrB is appropriate to discriminate between closely related species, we evaluate the 16S rRNA gene phylogeny of its members. As the published universal primer set for gyrB failed to amplify the responding gene of the majority of the 80 type strains of the family, three new primer sets were identified that generated fragments with a composite sequence length of about 900 nt. However, the amplification of all three fragments was successful only in 25% of the 80 type strains. In this study, the substitution frequencies in genes encoding gyrase and 16S rDNA were compared for 10 strains of nine genera. The frequency of gyrB nucleotide substitution is significantly higher than that of the 16S rDNA, and no linear correlation exists between the similarities of both molecules among members of the Microbacteriaceae. The phylogenetic analyses using the gyrB sequences provide higher resolution than using 16S rDNA sequences and seem able to discriminate between closely related species.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria

PubMed Central

Gaby, John Christian; Buckley, Daniel H.

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

PubMed

Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

2018-03-20

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.
First Report and Characterization of Pestalotiopsis ellipsospora Causing Canker on Acanthopanax divaricatus

PubMed Central

Yun, Yeo Hong; Ahn, Geum Ran

2015-01-01

Acanthopanax divaricatus, a member of the Araliaceae family, has been used as an invigorant in traditional Korean medicine. During disease monitoring, a stem with small, irregular, brown lesions was sampled at a farm in Cheonan in 2011. The symptoms seen were sunken cankers and reddish-brown needles on the infected twig. The isolated fungal colonies were whitish, having crenated edges and aerial mycelium on the surface, and with black gregarious fruiting bodies. The reverse plate was creamy white. Conidia were 17~22 × 3.5~4.2 µm, fusiform, 4-septate, and straight to slightly curved. The nucleotide sequence of the partial translation elongation factor 1 alpha gene of the fungal isolate, shares 99% sequence identity with that of known Pestalotiopsis ellipsospora. Based on the results of the morphological and molecular analyses, the fungal isolate was identified as P. ellipsospora. In Korea, this is the first report of canker on A. divaricatus. PMID:26539058
First Report and Characterization of Pestalotiopsis ellipsospora Causing Canker on Acanthopanax divaricatus.

PubMed

Yun, Yeo Hong; Ahn, Geum Ran; Kim, Seong Hwan

2015-09-01

Acanthopanax divaricatus, a member of the Araliaceae family, has been used as an invigorant in traditional Korean medicine. During disease monitoring, a stem with small, irregular, brown lesions was sampled at a farm in Cheonan in 2011. The symptoms seen were sunken cankers and reddish-brown needles on the infected twig. The isolated fungal colonies were whitish, having crenated edges and aerial mycelium on the surface, and with black gregarious fruiting bodies. The reverse plate was creamy white. Conidia were 17~22 × 3.5~4.2 µm, fusiform, 4-septate, and straight to slightly curved. The nucleotide sequence of the partial translation elongation factor 1 alpha gene of the fungal isolate, shares 99% sequence identity with that of known Pestalotiopsis ellipsospora. Based on the results of the morphological and molecular analyses, the fungal isolate was identified as P. ellipsospora. In Korea, this is the first report of canker on A. divaricatus.
Modulation of tissue repair by regeneration enhancer elements.

PubMed

Kang, Junsu; Hu, Jianxin; Karra, Ravi; Dickson, Amy L; Tornini, Valerie A; Nachtrab, Gregory; Gemberling, Matthew; Goldman, Joseph A; Black, Brian L; Poss, Kenneth D

2016-04-14

How tissue regeneration programs are triggered by injury has received limited research attention. Here we investigate the existence of enhancer regulatory elements that are activated in regenerating tissue. Transcriptomic analyses reveal that leptin b (lepb) is highly induced in regenerating hearts and fins of zebrafish. Epigenetic profiling identified a short DNA sequence element upstream and distal to lepb that acquires open chromatin marks during regeneration and enables injury-dependent expression from minimal promoters. This element could activate expression in injured neonatal mouse tissues and was divisible into tissue-specific modules sufficient for expression in regenerating zebrafish fins or hearts. Simple enhancer-effector transgenes employing lepb-linked sequences upstream of pro- or anti-regenerative factors controlled the efficacy of regeneration in zebrafish. Our findings provide evidence for 'tissue regeneration enhancer elements' (TREEs) that trigger gene expression in injury sites and can be engineered to modulate the regenerative potential of vertebrate organs.
Whole-genome resequencing reveals signatures of selection and timing of duck domestication.

PubMed

Zhang, Zebin; Jia, Yaxiong; Almeida, Pedro; Mank, Judith E; van Tuinen, Marcel; Wang, Qiong; Jiang, Zhihua; Chen, Yu; Zhan, Kai; Hou, Shuisheng; Zhou, Zhengkui; Li, Huifang; Yang, Fangxi; He, Yong; Ning, Zhonghua; Yang, Ning; Qu, Lujiang

2018-04-01

The genetic basis of animal domestication remains poorly understood, and systems with substantial phenotypic differences between wild and domestic populations are useful for elucidating the genetic basis of adaptation to new environments as well as the genetic basis of rapid phenotypic change. Here, we sequenced the whole genome of 78 individual ducks, from two wild and seven domesticated populations, with an average sequencing depth of 6.42X per individual. Our population and demographic analyses indicate a complex history of domestication, with early selection for separate meat and egg lineages. Genomic comparison of wild to domesticated populations suggests that genes that affect brain and neuronal development have undergone strong positive selection during domestication. Our FST analysis also indicates that the duck white plumage is the result of selection at the melanogenesis-associated transcription factor locus. Our results advance the understanding of animal domestication and selection for complex phenotypic traits.
Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

PubMed Central

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
Cost analysis of whole genome sequencing in German clinical practice.

PubMed

Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

2017-06-01

Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.
Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

PubMed

Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

2015-01-01

With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.
Personalized genomic analyses for cancer mutation discovery and interpretation

PubMed Central

Jones, Siân; Anagnostou, Valsamo; Lytle, Karli; Parpart-Li, Sonya; Nesselbush, Monica; Riley, David R.; Shukla, Manish; Chesnick, Bryan; Kadan, Maura; Papp, Eniko; Galens, Kevin G.; Murphy, Derek; Zhang, Theresa; Kann, Lisa; Sausen, Mark; Angiuoli, Samuel V.; Diaz, Luis A.; Velculescu, Victor E.

2015-01-01

Massively parallel sequencing approaches are beginning to be used clinically to characterize individual patient tumors and to select therapies based on the identified mutations. A major question in these analyses is the extent to which these methods identify clinically actionable alterations and whether the examination of the tumor tissue alone is sufficient or whether matched normal DNA should also be analyzed to accurately identify tumor-specific (somatic) alterations. To address these issues, we comprehensively evaluated 815 tumor-normal paired samples from patients of 15 tumor types. We identified genomic alterations using next-generation sequencing of whole exomes or 111 targeted genes that were validated with sensitivities >95% and >99%, respectively, and specificities >99.99%. These analyses revealed an average of 140 and 4.3 somatic mutations per exome and targeted analysis, respectively. More than 75% of cases had somatic alterations in genes associated with known therapies or current clinical trials. Analyses of matched normal DNA identified germline alterations in cancer-predisposing genes in 3% of patients with apparently sporadic cancers. In contrast, a tumor-only sequencing approach could not definitively identify germline changes in cancer-predisposing genes and led to additional false-positive findings comprising 31% and 65% of alterations identified in targeted and exome analyses, respectively, including in potentially actionable genes. These data suggest that matched tumor-normal sequencing analyses are essential for precise identification and interpretation of somatic and germline alterations and have important implications for the diagnostic and therapeutic management of cancer patients. PMID:25877891

Genome-wide-analyses of Listeria monocytogenes from food-processing plants reveal clonal diversity and date the emergence of persisting sequence types.

PubMed

Knudsen, Gitte M; Nielsen, Jesper Boye; Marvig, Rasmus L; Ng, Yin; Worning, Peder; Westh, Henrik; Gram, Lone

2017-08-01

Whole genome sequencing is increasing used in epidemiology, e.g. for tracing outbreaks of food-borne diseases. This requires in-depth understanding of pathogen emergence, persistence and genomic diversity along the food production chain including in food processing plants. We sequenced the genomes of 80 isolates of Listeria monocytogenes sampled from Danish food processing plants over a time-period of 20 years, and analysed the sequences together with 10 public available reference genomes to advance our understanding of interplant and intraplant genomic diversity of L. monocytogenes. Except for three persisting sequence types (ST) based on Multi Locus Sequence Typing being ST7, ST8 and ST121, long-term persistence of clonal groups was limited, and new clones were introduced continuously, potentially from raw materials. No particular gene could be linked to the persistence phenotype. Using time-based phylogenetic analyses of the persistent STs, we estimate the L. monocytogenes evolutionary rate to be 0.18-0.35 single nucleotide polymorphisms/year, suggesting that the persistent STs emerged approximately 100 years ago, which correlates with the onset of industrialization and globalization of the food market. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
The Neandertal genome and ancient DNA authenticity

PubMed Central

Green, Richard E; Briggs, Adrian W; Krause, Johannes; Prüfer, Kay; Burbano, Hernán A; Siebauer, Michael; Lachmann, Michael; Pääbo, Svante

2009-01-01

Recent advances in high-thoughput DNA sequencing have made genome-scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large-scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar ‘boot-strap' approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired. PMID:19661919
Exploring Insight: Focus on Shifts of Attention

ERIC Educational Resources Information Center

Palatnik, Alik; Koichu, Boris

2015-01-01

The paper presents and analyses a sequence of events that preceded an insight solution to a challenging problem in the context of numerical sequences. A threeweek long solution process by a pair of ninth-grade students is analysed by means of the theory of shifts of attention. The goal for this article is to reveal the potential of this theory…
Sequence stratigraphic analysis of Cenomanian greenhouse palaeosols: A case study from southern Patagonia, Argentina

NASA Astrophysics Data System (ADS)

Varela, Augusto N.; Veiga, Gonzalo D.; Poiré, Daniel G.

2012-10-01

The aim of this contribution is to analyse extrinsic (i.e., tectonics, climate and eustasy) and intrinsic (i.e., palaeotopography, palaeodrainage and relative sedimentation rates) factors that controlled palaeosol development in the Cenomanian Mata Amarilla Formation (Austral foreland basin, southwestern Patagonia, Argentina). Detailed sedimentological logs, facies analysis, pedofeatures and palaeosol horizon identification led to the definition of six pedotypes, which represent Histosols, acid sulphate Histosols, Vertisols, hydromorphic Vertisols, Inceptisols and vertic Alfisols. Small- and large-scale changes in palaeosol development were recognised throughout the units. Small-scale or high-frequency variations, identified within the middle section are represented by the lateral and vertical superimposition of Inceptisols, Vertisols and hydromorphic Vertisols. Lateral changes are interpreted as the result of intrinsic factors to the depositional systems, such as the relative position within the floodplain and the distance from the main channels, that condition the nature of parent material, the sedimentation rate and eventually the palaeotopographic position. Vertical stacking of different soil types is linked to avulsion processes and the relatively abrupt change in the distance to main channels as the system aggraded. The large-scale or low-frequency vertical variations in palaeosol type occurring in the Mata Amarilla Formation are related to long-term changes in depositional environments. The lower and upper sections of the studied logs are characterised by Histosols and acid sulphate Histosols, and few hydromorphic Vertisols associated with low-gradient coastal environments (i.e., lagoons, estuaries and distal fluvial systems). At the lower boundary of the middle section, a thick palaeosol succession composed of vertic Alfisols occurs. The rest of the middle section is characterised by Vertisols, hydromorphic Vertisols and Inceptisols occurring on distal and proximal fluvial floodplains, respectively. The palaeosol succession for the Mata Amarilla Formation can be analysed within a sequence stratigraphic scheme considering changes in depositional environments in relation to accommodation/supply conditions. The results contrast with classical models, mainly in that the palaeosols of the Mata Amarilla Formation are relatively well-developed throughout the whole sequence, including transgressive periods of relatively high aggradation rate. Also, even when during regressive episodes, when a thick palaeosol succession that marks the sequence boundary is developed in the classical models, the lack of incised valleys in this succession led to the preservation of thick palaeosol successions during lowstand conditions. The vertical and lateral palaeosol distribution identified in the Mata Amarilla Formation could be eventually extrapolated to other sequences deposited during climate optimums.
Whole-genome sequencing and analyses identify high genetic heterogeneity, diversity and endemicity of rotavirus genotype P[6] strains circulating in Africa.

PubMed

Nyaga, Martin M; Tan, Yi; Seheri, Mapaseka L; Halpin, Rebecca A; Akopov, Asmik; Stucker, Karla M; Fedorova, Nadia B; Shrivastava, Susmita; Duncan Steele, A; Mwenda, Jason M; Pickett, Brett E; Das, Suman R; Jeffrey Mphahlele, M

2018-05-18

Rotavirus A (RVA) exhibits a wide genotype diversity globally. Little is known about the genetic composition of genotype P[6] from Africa. This study investigated possible evolutionary mechanisms leading to genetic diversity of genotype P[6] VP4 sequences. Phylogenetic analyses on 167 P[6] VP4 full-length sequences were conducted, which included six porcine-origin sequences. Of the 167 sequences, 57 were newly acquired through whole genome sequencing as part of this study. The other 110 sequences were all publicly-available global P[6] VP4 full-length sequences downloaded from GenBank. The strength of association between the phenotypic features and the phylogeny was also determined. A number of reassortment and mixed infections of RVA genotype P[6] strains were observed in this study. Phylogenetic analyses demostrated the extensive genetic diversity that exists among human P[6] strains, porcine-like strains, their concomitant clades/subclades and estimated that P[6] VP4 gene has a higher substitution rate with the mean of 1.05E-3 substitutions/site/year. Further, the phylogenetic analyses indicated that genotype P[6] strains were endemic in Africa, characterised by an extensive genetic diversity and long-time local evolution of the viruses. This was also supported by phylogeographic clustering and G-genotype clustering of the P[6] strains when Bayesian Tip-association Significance testing (BaTS) was applied, clearly supporting that the viruses evolved locally in Africa instead of spatial mixing among different regions. Overall, the results demonstrated that multiple mechanisms such as reassortment events, various mutations and possibly interspecies transmission account for the enormous diversity of genotype P[6] strains in Africa. These findings highlight the need for continued global surveillance of rotavirus diversity. Copyright © 2018 Elsevier B.V. All rights reserved.
Rates of evolution in stress-related genes are associated with habitat preference in two Cardamine lineages

PubMed Central

2012-01-01

Background Elucidating the selective and neutral forces underlying molecular evolution is fundamental to understanding the genetic basis of adaptation. Plants have evolved a suite of adaptive responses to cope with variable environmental conditions, but relatively little is known about which genes are involved in such responses. Here we studied molecular evolution on a genome-wide scale in two species of Cardamine with distinct habitat preferences: C. resedifolia, found at high altitudes, and C. impatiens, found at low altitudes. Our analyses focussed on genes that are involved in stress responses to two factors that differentiate the high- and low-altitude habitats, namely temperature and irradiation. Results High-throughput sequencing was used to obtain gene sequences from C. resedifolia and C. impatiens. Using the available A. thaliana gene sequences and annotation, we identified nearly 3,000 triplets of putative orthologues, including genes involved in cold response, photosynthesis or in general stress responses. By comparing estimated rates of molecular substitution, codon usage, and gene expression in these species with those of Arabidopsis, we were able to evaluate the role of positive and relaxed selection in driving the evolution of Cardamine genes. Our analyses revealed a statistically significant higher rate of molecular substitution in C. resedifolia than in C. impatiens, compatible with more efficient positive selection in the former. Conversely, the genome-wide level of selective pressure is compatible with more relaxed selection in C. impatiens. Moreover, levels of selective pressure were heterogeneous between functional classes and between species, with cold responsive genes evolving particularly fast in C. resedifolia, but not in C. impatiens. Conclusions Overall, our comparative genomic analyses revealed that differences in effective population size might contribute to the differences in the rate of protein evolution and in the levels of selective pressure between the C. impatiens and C. resedifolia lineages. The within-species analyses also revealed evolutionary patterns associated with habitat preference of two Cardamine species. We conclude that the selective pressures associated with the habitats typical of C. resedifolia may have caused the rapid evolution of genes involved in cold response. PMID:22257588
Across the Gap: Geochronological and Sedimentological Analyses from the Late Pleistocene-Holocene Sequence of Goda Buticha, Southeastern Ethiopia

PubMed Central

Asrat, Asfawossen; Bahain, Jean-Jacques; Chapon, Cécile; Douville, Eric; Fragnol, Carole; Hernandez, Marion; Hovers, Erella; Leplongeon, Alice; Martin, Loïc; Pleurdeau, David; Pearson, Osbjorn; Puaud, Simon; Assefa, Zelalem

2017-01-01

Goda Buticha is a cave site near Dire Dawa in southeastern Ethiopia that contains an archaeological sequence sampling the late Pleistocene and Holocene of the region. The sedimentary sequence displays complex cultural, chronological and sedimentological histories that seem incongruent with one another. A first set of radiocarbon ages suggested a long sedimentological gap from the end of Marine Isotopic Stage (MIS) 3 to the mid-Holocene. Macroscopic observations suggest that the main sedimentological change does not coincide with the chronostratigraphic hiatus. The cultural sequence shows technological continuity with a late persistence of artifacts that are usually attributed to the Middle Stone Age into the younger parts of the stratigraphic sequence, yet become increasingly associated with lithic artifacts typically related to the Later Stone Age. While not a unique case, this combination of features is unusual in the Horn of Africa. In order to evaluate the possible implications of these observations, sedimentological analyses combined with optically stimulated luminescence (OSL) were conducted. The OSL data now extend the radiocarbon chronology up to 63 ± 7 ka; they also confirm the existence of the chronological gap between 24.8 ± 2.6 ka and 7.5 ± 0.3 ka. The sedimentological analyses suggest that the origin and mode of deposition were largely similar throughout the whole sequence, although the anthropic and faunal activities increased in the younger levels. Regional climatic records are used to support the sedimentological observations and interpretations. We discuss the implications of the sedimentological and dating analyses for understanding cultural processes in the region. PMID:28125597
Across the Gap: Geochronological and Sedimentological Analyses from the Late Pleistocene-Holocene Sequence of Goda Buticha, Southeastern Ethiopia.

PubMed

Tribolo, Chantal; Asrat, Asfawossen; Bahain, Jean-Jacques; Chapon, Cécile; Douville, Eric; Fragnol, Carole; Hernandez, Marion; Hovers, Erella; Leplongeon, Alice; Martin, Loïc; Pleurdeau, David; Pearson, Osbjorn; Puaud, Simon; Assefa, Zelalem

2017-01-01

Goda Buticha is a cave site near Dire Dawa in southeastern Ethiopia that contains an archaeological sequence sampling the late Pleistocene and Holocene of the region. The sedimentary sequence displays complex cultural, chronological and sedimentological histories that seem incongruent with one another. A first set of radiocarbon ages suggested a long sedimentological gap from the end of Marine Isotopic Stage (MIS) 3 to the mid-Holocene. Macroscopic observations suggest that the main sedimentological change does not coincide with the chronostratigraphic hiatus. The cultural sequence shows technological continuity with a late persistence of artifacts that are usually attributed to the Middle Stone Age into the younger parts of the stratigraphic sequence, yet become increasingly associated with lithic artifacts typically related to the Later Stone Age. While not a unique case, this combination of features is unusual in the Horn of Africa. In order to evaluate the possible implications of these observations, sedimentological analyses combined with optically stimulated luminescence (OSL) were conducted. The OSL data now extend the radiocarbon chronology up to 63 ± 7 ka; they also confirm the existence of the chronological gap between 24.8 ± 2.6 ka and 7.5 ± 0.3 ka. The sedimentological analyses suggest that the origin and mode of deposition were largely similar throughout the whole sequence, although the anthropic and faunal activities increased in the younger levels. Regional climatic records are used to support the sedimentological observations and interpretations. We discuss the implications of the sedimentological and dating analyses for understanding cultural processes in the region.
Evolution and molecular epidemiology of classical swine fever virus during a multi-annual outbreak amongst European wild boar.

PubMed

Goller, Katja V; Gabriel, Claudia; Dimna, Mireille Le; Le Potier, Marie-Frédérique; Rossi, Sophie; Staubach, Christoph; Merboth, Matthias; Beer, Martin; Blome, Sandra

2016-03-01

Classical swine fever is a viral disease of pigs that carries tremendous socio-economic impact. In outbreak situations, genetic typing is carried out for the purpose of molecular epidemiology in both domestic pigs and wild boar. These analyses are usually based on harmonized partial sequences. However, for high-resolution analyses towards the understanding of genetic variability and virus evolution, full-genome sequences are more appropriate. In this study, a unique set of representative virus strains was investigated that was collected during an outbreak in French free-ranging wild boar in the Vosges-du-Nord mountains between 2003 and 2007. Comparative sequence and evolutionary analyses of the nearly full-length sequences showed only slow evolution of classical swine fever virus strains over the years and no impact of vaccination on mutation rates. However, substitution rates varied amongst protein genes; furthermore, a spatial and temporal pattern could be observed whereby two separate clusters were formed that coincided with physical barriers.
An analysis of the sequence of the BAD gene among patients with maturity-onset diabetes of the young (MODY).

PubMed

Antosik, Karolina; Gnyś, Piotr; Jarosz-Chobot, Przemysława; Myśliwiec, Małgorzata; Szadkowska, Agnieszka; Małecki, Maciej; Młynarski, Wojciech; Borowiec, Maciej

2017-01-01

Monogenic diabetes is a rare disease caused by single gene mutations. Maturity onset diabetes of the young (MODY) is one of the major forms of monogenic diabetes recognised in the paediatric population. To date, 13 genes have been related to MODY development. The aim of the study was to analyse the sequence of the BCL2-associated agonist of cell death (BAD) gene in patients with clinical suspicion of GCK-MODY, but who were negative for glucokinase (GCK) gene mutations. A group of 122 diabetic patients were recruited from the "Polish Registry for Paediatric and Adolescent Diabetes - nationwide genetic screening for monogenic diabetes" project. The molecular testing was performed by Sanger sequencing. A total of 10 sequence variants of the BAD gene were identified in 122 analysed diabetic patients. Among the analysed patients suspected of MODY, one possible pathogenic variant was identified in one patient; however, further confirmation is required for a certain identification.
Phylogenetic study on Shiraia bambusicola by rDNA sequence analyses.

PubMed

Cheng, Tian-Fan; Jia, Xiao-Ming; Ma, Xiao-Hang; Lin, Hai-Ping; Zhao, Yu-Hua

2004-01-01

In this study, 18S rDNA and ITS-5.8S rDNA regions of four Shiraia bambusicola isolates collected from different species of bamboos were amplified by PCR with universal primer pairs NS1/NS8 and ITS5/ITS4, respectively, and sequenced. Phylogenetic analyses were conducted on three selected datasets of rDNA sequences. Maximum parsimony, distance and maximum likelihood criteria were used to infer trees. Morphological characteristics were also observed. The positioning of Shiraia in the order Pleosporales was well supported by bootstrap, which agreed with the placement by Amano (1980) according to their morphology. We did not find significant inter-hostal differences among these four isolates from different species of bamboos. From the results of analyses and comparison of their rDNA sequences, we conclude that Shiraia should be classified into Pleosporales as Amano (1980) proposed and suggest that it might be positioned in the family Phaeosphaeriaceae. Copyright 2004 WILEY-VCH Verlag GmbH & Co.
Sputnik: a database platform for comparative plant genomics.

PubMed

Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

2003-01-01

Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.
Sputnik: a database platform for comparative plant genomics

PubMed Central

Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

2003-01-01

Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965
Genome-wide analyses of the bHLH superfamily in crustaceans: reappraisal of higher-order groupings and evidence for lineage-specific duplications

PubMed Central

2018-01-01

The basic helix-loop-helix (bHLH) proteins represent a key group of transcription factors implicated in numerous eukaryotic developmental and signal transduction processes. Characterization of bHLHs from model species such as humans, fruit flies, nematodes and plants have yielded important information on their functions and evolutionary origin. However, relatively little is known about bHLHs in non-model organisms despite the availability of a vast number of high-throughput sequencing datasets, enabling previously intractable genome-wide and cross-species analyses to be now performed. We extensively searched for bHLHs in 126 crustacean species represented across major Crustacea taxa and identified 3777 putative bHLH orthologues. We have also included seven whole-genome datasets representative of major arthropod lineages to obtain a more accurate prediction of the full bHLH gene complement. With focus on important food crop species from Decapoda, we further defined higher-order groupings and have successfully recapitulated previous observations in other animals. Importantly, we also observed evidence for lineage-specific bHLH expansions in two basal crustaceans (branchiopod and copepod), suggesting a mode of evolution through gene duplication as an adaptation to changing environments. In-depth analysis on bHLH-PAS members confirms the phenomenon coined as ‘modular evolution’ (independently evolved domains) typically seen in multidomain proteins. With the amphipod Parhyale hawaiensis as the exception, our analyses have focused on crustacean transcriptome datasets. Hence, there is a clear requirement for future analyses on whole-genome sequences to overcome potential limitations associated with transcriptome mining. Nonetheless, the present work will serve as a key resource for future mechanistic and biochemical studies on bHLHs in economically important crustacean food crop species. PMID:29657824
Telomerecat: A ploidy-agnostic method for estimating telomere length from whole genome sequencing data.

PubMed

Farmery, James H R; Smith, Mike L; Lynch, Andy G

2018-01-22

Telomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, repeated measurements, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.
A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

PubMed

Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

2013-01-01

Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.
A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

PubMed Central

Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

2013-01-01

Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225
Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation.

PubMed

Artieri, Carlo G; Fraser, Hunter B

2014-12-01

The recent advent of ribosome profiling-sequencing of short ribosome-bound fragments of mRNA-has offered an unprecedented opportunity to interrogate the sequence features responsible for modulating translational rates. Nevertheless, numerous analyses of the first riboprofiling data set have produced equivocal and often incompatible results. Here we analyze three independent yeast riboprofiling data sets, including two with much higher coverage than previously available, and find that all three show substantial technical sequence biases that confound interpretations of ribosomal occupancy. After accounting for these biases, we find no effect of previously implicated factors on ribosomal pausing. Rather, we find that incorporation of proline, whose unique side-chain stalls peptide synthesis in vitro, also slows the ribosome in vivo. We also reanalyze a method that implicated positively charged amino acids as the major determinant of ribosomal stalling and demonstrate that it produces false signals of stalling in low-coverage data. Our results suggest that any analysis of riboprofiling data should account for sequencing biases and sparse coverage. To this end, we establish a robust methodology that enables analysis of ribosome profiling data without prior assumptions regarding which positions spanned by the ribosome cause stalling. © 2014 Artieri and Fraser; Published by Cold Spring Harbor Laboratory Press.
Saturnia jonasii Butler, 1877 on Jejudo Island, a new saturnid moth of South Korea with DNA data and morphology (Lepidoptera: Saturniidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2015-04-10

Saturnia (Rinaca) jonasii Butler, 1877 is distributed in Japan, including Tsushima Island and Taiwan, whereas S. boisduvalii Eversmann, 1846 is distributed in northern areas, such as China, Russia, and South Korea. In the present study we found that the specimens from Mt. Hallasan on Jejudo, a southern remote offshore island, were S. jonasii, rather than S. boisduvalii based on morphology, DNA barcode, and nuclear elongation factor 1 alpha (EF-1α) sequences. The major morphological differences between the two species included the shape of wing pattern elements of fore- and hindwings and male and female genitalia. A DNA barcode analysis of the sequences of the Jejudo specimens and S. boisduvalii, along with those of Saturnia species obtained from a public database showed a minimum sequence divergence of 4.26% (28 bp). A phylogenetic analysis also showed clustering of the Jejudo specimens with S. jonasii, separating S. boisduvalii (Bayesian posterior probability = 0.99). The EF-1α-based sequence and phylogenetic analyses of the two species from Jejudo Island and the Korean mainland showed the uniqueness of the Jejudo specimens from S. boisduvalii collected on the Korean mainland, indicating distribution of S. jonasii on Jejudo Island in South Korea, instead of S. boisduvalii.
Viruses of invasive Argentine ants from the European Main supercolony: characterization, interactions and evolution.

PubMed

Viljakainen, Lumi; Holmberg, Ida; Abril, Sílvia; Jurvansuu, Jaana

2018-06-25

The Argentine ant (Linepithema humile) is a highly invasive pest, yet very little is known about its viruses. We analysed individual RNA-sequencing data from 48 Argentine ant queens to identify and characterisze their viruses. We discovered eight complete RNA virus genomes - all from different virus families - and one putative partial entomopoxvirus genome. Seven of the nine virus sequences were found from ant samples spanning 7 years, suggesting that these viruses may cause long-term infections within the super-colony. Although all nine viruses successfully infect Argentine ants, they have very different characteristics, such as genome organization, prevalence, loads, activation frequencies and rates of evolution. The eight RNA viruses constituted in total 23 different virus combinations which, based on statistical analysis, were non-random, suggesting that virus compatibility is a factor in infections. We also searched for virus sequences from New Zealand and Californian Argentine ant RNA-sequencing data and discovered that many of the viruses are found on different continents, yet some viruses are prevalent only in certain colonies. The viral loads described here most probably present a normal asymptomatic level of infection; nevertheless, detailed knowledge of Argentine ant viruses may enable the design of viral biocontrol methods against this pest.

TaxI: a software tool for DNA barcoding using distance methods

PubMed Central

Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

2005-01-01

DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Soil bacterial diversity patterns and drivers along an elevational gradient on Shennongjia Mountain, China

PubMed Central

Zhang, Yuguang; Cong, Jing; Lu, Hui; Li, Guangliang; Xue, Yadong; Deng, Ye; Li, Hui; Zhou, Jizhong; Li, Diqiang

2015-01-01

Understanding biological diversity elevational pattern and the driver factors are indispensable to develop the ecological theories. Elevational gradient may minimize the impact of environmental factors and is the ideal places to study soil microbial elevational patterns. In this study, we selected four typical vegetation types from 1000 to 2800 m above the sea level on the northern slope of Shennongjia Mountain in central China, and analysed the soil bacterial community composition, elevational patterns and the relationship between soil bacterial diversity and environmental factors by using the 16S rRNA Illumina sequencing and multivariate statistical analysis. The results revealed that the dominant bacterial phyla were Acidobacteria, Actinobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria and Verrucomicrobia, which accounted for over 75% of the bacterial sequences obtained from tested samples, and the soil bacterial operational taxonomic unit (OTU) richness was a significant monotonous decreasing (P < 0.01) trend with the elevational increasing. The similarity of soil bacterial population composition decreased significantly (P < 0.01) with elevational distance increased as measured by the Jaccard and Bray–Curtis index. Canonical correspondence analysis and Mantel test analysis indicated that plant diversity and soil pH were significantly correlated (P < 0.01) with the soil bacterial community. Therefore, the soil bacterial diversity on Shennongjia Mountain had a significant and different elevational pattern, and plant diversity and soil pH may be the key factors in shaping the soil bacterial spatial pattern. PMID:26032124
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

PubMed

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Novel application of the MSSCP method in biodiversity studies.

PubMed

Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula

2012-02-01

Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

PubMed

van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

2016-01-01

RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.
Streptococcus suis in invasive human infections in Poland: clonality and determinants of virulence and antimicrobial resistance.

PubMed

Bojarska, A; Molska, E; Janas, K; Skoczyńska, A; Stefaniuk, E; Hryniewicz, W; Sadowy, E

2016-06-01

The purpose of this study was to perform an analysis of Streptococcus suis human invasive isolates, collected in Poland by the National Reference Centre for Bacterial Meningitis. Isolates obtained from 21 patients during 2000-2013 were investigated by phenotypic tests, multilocus sequence typing (MLST), analysis of the TR9 locus from the multilocus variable number tandem repeat (VNTR) analysis (MLVA) scheme and pulsed-field gel electrophoresis (PFGE) of SmaI-digested DNA. Determinants of virulence and antimicrobial resistance were detected by polymerase chain reaction (PCR) and analysed by sequencing. All isolates represented sequence type 1 (ST1) and were suggested to be serotype 2. PFGE and analysis of the TR9 locus allowed the discrimination of four and 17 types, respectively. Most of the isolates were haemolysis- and DNase-positive, and around half of them formed biofilm. Genes encoding suilysin, extracellular protein factor, fibronectin-binding protein, muramidase-released protein, surface antigen one, enolase, serum opacity factor and pili were ubiquitous in the studied group, while none of the isolates carried sequences characteristic for the 89K pathogenicity island. All isolates were susceptible to penicillin, cefotaxime, imipenem, moxifloxacin, chloramphenicol, rifampicin, gentamicin, linezolid, vancomycin and daptomycin. Five isolates (24 %) were concomitantly non-susceptible to erythromycin, clindamycin and tetracycline, and harboured the tet(O) and erm(B) genes; for one isolate, lsa(E) and lnu(B) were additionally detected. Streptococcus suis isolated in Poland from human invasive infections belongs to a globally distributed clonal complex of this pathogen, enriched in virulence markers. This is the first report of the lsa(E) and lnu(B) resistance genes in S. suis.
Identification of the Regulator Gene Responsible for the Acetone-Responsive Expression of the Binuclear Iron Monooxygenase Gene Cluster in Mycobacteria ▿

PubMed Central

Furuya, Toshiki; Hirose, Satomi; Semba, Hisashi; Kino, Kuniki

2011-01-01

The mimABCD gene cluster encodes the binuclear iron monooxygenase that oxidizes propane and phenol in Mycobacterium smegmatis strain MC2 155 and Mycobacterium goodii strain 12523. Interestingly, expression of the mimABCD gene cluster is induced by acetone. In this study, we investigated the regulator gene responsible for this acetone-responsive expression. In the genome sequence of M. smegmatis strain MC2 155, the mimABCD gene cluster is preceded by a gene designated mimR, which is divergently transcribed. Sequence analysis revealed that MimR exhibits amino acid similarity with the NtrC family of transcriptional activators, including AcxR and AcoR, which are involved in acetone and acetoin metabolism, respectively. Unexpectedly, many homologs of the mimR gene were also found in the sequenced genomes of actinomycetes. A plasmid carrying a transcriptional fusion of the intergenic region between the mimR and mimA genes with a promoterless green fluorescent protein (GFP) gene was constructed and introduced into M. smegmatis strain MC2 155. Using a GFP reporter system, we confirmed by deletion and complementation analyses that the mimR gene product is the positive regulator of the mimABCD gene cluster expression that is responsive to acetone. M. goodii strain 12523 also utilized the same regulatory system as M. smegmatis strain MC2 155. Although transcriptional activators of the NtrC family generally control transcription using the σ54 factor, a gene encoding the σ54 factor was absent from the genome sequence of M. smegmatis strain MC2 155. These results suggest the presence of a novel regulatory system in actinomycetes, including mycobacteria. PMID:21856847
TMTC2 variant associated with sensorineural hearing loss and auditory neuropathy spectrum disorder in a family dyad.

PubMed

Guillen-Ahlers, Hector; Erbe, Christy B; Chevalier, Frédéric D; Montoya, Maria J; Zimmerman, Kip D; Langefeld, Carl D; Olivier, Michael; Runge, Christina L

2018-04-19

Sensorineural hearing loss (SNHL) is a common form of hearing loss that can be inherited or triggered by environmental insults; auditory neuropathy spectrum disorder (ANSD) is a SNHL subtype with unique diagnostic criteria. The genetic factors associated with these impairments are vast and diverse, but causal genetic factors are rarely characterized. A family dyad, both cochlear implant recipients, presented with a hearing history of bilateral, progressive SNHL, and ANSD. Whole-exome sequencing was performed to identify coding sequence variants shared by both family members, and screened against genes relevant to hearing loss and variants known to be associated with SNHL and ANSD. Both family members are successful cochlear implant users, demonstrating effective auditory nerve stimulation with their devices. Genetic analyses revealed a mutation (rs35725509) in the TMTC2 gene, which has been reported previously as a likely genetic cause of SNHL in another family of Northern European descent. This study represents the first confirmation of the rs35725509 variant in an independent family as a likely cause for the complex hearing loss phenotype (SNHL and ANSD) observed in this family dyad. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.
Uncovering potential 'herbal probiotics' in Juzen-taiho-to through the study of associated bacterial populations.

PubMed

Montenegro, Diego; Kalpana, Kriti; Chrissian, Christine; Sharma, Ashutosh; Takaoka, Anna; Iacovidou, Maria; Soll, Clifford E; Aminova, Olga; Heguy, Adriana; Cohen, Lisa; Shen, Steven; Kawamura, Akira

2015-02-01

Juzen-taiho-to (JTT) is an immune-boosting formulation of ten medicinal herbs. It is used clinically in East Asia to boost the human immune functions. The active factors in JTT have not been clarified. But, existing evidence suggests that lipopolysaccharide (LPS)-like factors contribute to the activity. To examine this possibility, JTT was subjected to a series of analyses, including high resolution mass spectrometry, which suggested the presence of structural variants of LPS. This finding opened a possibility that JTT contains immune-boosting bacteria. As the first step to characterize the bacteria in JTT, 16S ribosomal RNA sequencing was carried out for Angelica sinensis (dried root), one of the most potent immunostimulatory herbs in JTT. The sequencing revealed a total of 519 bacteria genera in A. sinensis. The most abundant genus was Rahnella, which is widely distributed in water and plants. The abundance of Rahnella appeared to correlate with the immunostimulatory activity of A. sinensis. In conclusion, the current study provided new pieces of evidence supporting the emerging theory of bacterial contribution in immune-boosting herbs. Copyright © 2014 Elsevier Ltd. All rights reserved.
Severe invasive streptococcal infection by Streptococcus pyogenes and Streptococcus dysgalactiae subsp. equisimilis.

PubMed

Watanabe, Shinya; Takemoto, Norihiko; Ogura, Kohei; Miyoshi-Akiyama, Tohru

2016-01-01

Streptococcus pyogenes, a group A Streptococcus (GAS), has been recognized as the causative pathogen in patients with severe invasive streptococcal infection with or without necrotizing fasciitis. In recent epidemiological studies, Streptococcus dysgalactiae subsp. equisimilis (SDSE) has been isolated from severe invasive streptococcal infection. Complete genome sequence showed that SDSE is the closest bacterial species to GAS, with approximately 70% of genome coverage. SDSE, however, lacks several key virulence factors present in GAS, such as SPE-B, the hyaluronan synthesis operon and active superantigen against human immune cells. A key event in the ability of GAS to cause severe invasive streptococcal infection was shown to be the acquisition of novel genetic traits such as phages. Strikingly, however, during severe invasive infection, GAS destroys its own covRS two-component system, which negatively regulates many virulence factor genes, resulting in a hyper-virulent phenotype. In contrast, this phenomenon has not been observed in SDSE. The present review describes the epidemiology of severe invasive streptococcal infection and the detailed pathogenic mechanisms of GAS and SDSE, emphasizing findings from their genome sequences and analyses of gene expression. © 2015 The Societies and John Wiley & Sons Australia, Ltd.
Factors of transforming growth factor beta signalling are co-regulated in human hepatocellular carcinoma.

PubMed

Longerich, Thomas; Breuhahn, Kai; Odenthal, Margarete; Petmecky, Katharina; Schirmacher, Peter

2004-12-01

Transforming growth factor beta (TGFbeta) is a central mitoinhibitory factor for epithelial cells, and alterations of TGFbeta signalling have been demonstrated in many different human cancers. We have analysed human hepatocellular carcinomas (HCCs) for potential pro-tumourigenic alterations in regard to expression of Smad4 and mutations and expression changes of the pro-oncogenic transcriptional co-repressors Ski and SnoN, as well as mRNA levels of matrix metalloproteinase-2 (MMP2), which is transcriptionally regulated by TGFbeta. Smad4 mRNA was detected in all HCCs; while, using immunohistology, loss of Smad4 expression was found in 10% of HCCs. Neither mutations in the transformation-relevant sequences nor significant pro-tumourigenic expression changes of the Ski and SnoN genes were detected. In HCC cell lines, expression of both genes was regulated, potentially involving phosphorylation. Ski showed a distinct nuclear speckled pattern, indicating recruitment to active transcription complexes. MMP2 mRNA levels were increased in 19% of HCCs, whereas MMP2 mRNA was not detectable in HCC cell lines, suggesting that MMP2 was derived only from tumour stroma cells. Transcript levels of Smad4, Ski, SnoN and MMP2 correlated well. These data argue against a significant role of Ski and SnoN in human hepatocarcinogenesis and suggest that, in the majority of HCCs, the analysed factors are co-regulated by an upstream mechanism, potentially by TGFbeta itself.
Enhanced Gene Expression Rather than Natural Polymorphism in Coding Sequence of the OsbZIP23 Determines Drought Tolerance and Yield Improvement in Rice Genotypes

PubMed Central

Dey, Avishek; Samanta, Milan Kumar; Gayen, Srimonta; Sen, Soumitra K.; Maiti, Mrinal K.

2016-01-01

Drought is one of the major limiting factors for productivity of crops including rice (Oryza sativa L.). Understanding the role of allelic variations of key regulatory genes involved in stress-tolerance is essential for developing an effective strategy to combat drought. The bZIP transcription factors play a crucial role in abiotic-stress adaptation in plants via abscisic acid (ABA) signaling pathway. The present study aimed to search for allelic polymorphism in the OsbZIP23 gene across selected drought-tolerant and drought-sensitive rice genotypes, and to characterize the new allele through overexpression (OE) and gene-silencing (RNAi). Analyses of the coding DNA sequence (CDS) of the cloned OsbZIP23 gene revealed single nucleotide polymorphism at four places and a 15-nucleotide deletion at one place. The single-copy OsbZIP23 gene is expressed at relatively higher level in leaf tissues of drought-tolerant genotypes, and its abundance is more in reproductive stage. Cloning and sequence analyses of the OsbZIP23-promoter from drought-tolerant O. rufipogon and drought-sensitive IR20 cultivar showed variation in the number of stress-responsive cis-elements and a 35-nucleotide deletion at 5’-UTR in IR20. Analysis of the GFP reporter gene function revealed that the promoter activity of O. rufipogon is comparatively higher than that of IR20. The overexpression of any of the two polymorphic forms (1083 bp and 1068 bp CDS) of OsbZIP23 improved drought tolerance and yield-related traits significantly by retaining higher content of cellular water, soluble sugar and proline; and exhibited decrease in membrane lipid peroxidation in comparison to RNAi lines and non-transgenic plants. The OE lines showed higher expression of target genes-OsRab16B, OsRab21 and OsLEA3-1 and increased ABA sensitivity; indicating that OsbZIP23 is a positive transcriptional-regulator of the ABA-signaling pathway. Taken together, the present study concludes that the enhanced gene expression rather than natural polymorphism in coding sequence of OsbZIP23 is accountable for improved drought tolerance and yield performance in rice genotypes. PMID:26959651
Fibroblast growth factor 20 (FGF20) polymorphism is a risk factor for Parkinson's disease in Chinese population.

PubMed

Pan, Jing; Li, Hui; Wang, Ying; Ma, Jian-Fang; Zhang, Jin; Wang, Gang; Liu, Jun; Wang, Xi-Jin; Xiao, Qin; Chen, Sheng-Di

2012-06-01

The etiology of Parkinson's disease (PD) is not well established. Genetic variation in fibroblast growth factor 20 (FGF20) might influence the risk of PD occurrence and development. In this study, Two DNA polymorphisms at genetic variation in FGF20, rs2720208 (C/T) and rs1721100 (C/G), were genotyped by direct sequencing in Han Chinese population, including 394 PD patients and 383 healthy controls. Statistical analyses revealed that for rs1721100 (C/G) polymorphism, there were significant differences in genotype distribution between PD and healthy-matched controls. For rs12720208 (C/T) polymorphism, there was no significant difference in genotype distribution and gender and age-related differences between PD and control group. Results in this study revealed that the rs1721100(C/G) polymorphism is a risk factor for PD in Han Chinese population, while rs12720208(C/T) polymorphism is not significantly associated with PD. Copyright © 2012 Elsevier Ltd. All rights reserved.
Comparative transcriptome analyses of flower development in four species of Achimenes (Gesneriaceae).

PubMed

Roberts, Wade R; Roalson, Eric H

2017-03-20

Flowers have an amazingly diverse display of colors and shapes, and these characteristics often vary significantly among closely related species. The evolution of diverse floral form can be thought of as an adaptive response to pollination and reproduction, but it can also be seen through the lens of morphological and developmental constraints. To explore these interactions, we use RNA-seq across species and development to investigate gene expression and sequence evolution as they relate to the evolution of the diverse flowers in a group of Neotropical plants native to Mexico-magic flowers (Achimenes, Gesneriaceae). The assembled transcriptomes contain between 29,000 and 42,000 genes expressed during development. We combine sequence orthology and coexpression clustering with analyses of protein evolution to identify candidate genes for roles in floral form evolution. Over 25% of transcripts captured were distinctive to Achimenes and overrepresented by genes involved in transcription factor activity. Using a model-based clustering approach we find dynamic, temporal patterns of gene expression among species. Selection tests provide evidence of positive selection in several genes with roles in pigment production, flowering time, and morphology. Combining these approaches to explore genes related to flower color and flower shape, we find distinct patterns that correspond to transitions of floral form among Achimenes species. The floral transcriptomes developed from four species of Achimenes provide insight into the mechanisms involved in the evolution of diverse floral form among closely related species with different pollinators. We identified several candidate genes that will serve as an important and useful resource for future research. High conservation of sequence structure, patterns of gene coexpression, and detection of positive selection acting on few genes suggests that large phenotypic differences in floral form may be caused by genetic differences in a small set of genes. Our characterized floral transcriptomes provided here should facilitate further analyses into the genomics of flower development and the mechanisms underlying the evolution of diverse flowers in Achimenes and other Neotropical Gesneriaceae.
Pre-Test Analysis Predictions for the Shell Buckling Knockdown Factor Checkout Tests - TA01 and TA02

NASA Technical Reports Server (NTRS)

Thornburgh, Robert P.; Hilburger, Mark W.

2011-01-01

This report summarizes the pre-test analysis predictions for the SBKF-P2-CYL-TA01 and SBKF-P2-CYL-TA02 shell buckling tests conducted at the Marshall Space Flight Center (MSFC) in support of the Shell Buckling Knockdown Factor (SBKF) Project, NASA Engineering and Safety Center (NESC) Assessment. The test article (TA) is an 8-foot-diameter aluminum-lithium (Al-Li) orthogrid cylindrical shell with similar design features as that of the proposed Ares-I and Ares-V barrel structures. In support of the testing effort, detailed structural analyses were conducted and the results were used to monitor the behavior of the TA during the testing. A summary of predicted results for each of the five load sequences is presented herein.
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608
Alu expression in human cell lines and their retrotranspositional potential.

PubMed

Oler, Andrew J; Traina-Dorge, Stephen; Derbes, Rebecca S; Canella, Donatella; Cairns, Brad R; Roy-Engel, Astrid M

2012-06-20

The vast majority of the 1.1 million Alu elements are retrotranspositionally inactive, where only a few loci referred to as 'source elements' can generate new Alu insertions. The first step in identifying the active Alu sources is to determine the loci transcribed by RNA polymerase III (pol III). Previous genome-wide analyses from normal and transformed cell lines identified multiple Alu loci occupied by pol III factors, making them candidate source elements. Analysis of the data from these genome-wide studies determined that the majority of pol III-bound Alus belonged to the older subfamilies Alu S and Alu J, which varied between cell lines from 62.5% to 98.7% of the identified loci. The pol III-bound Alus were further scored for estimated retrotransposition potential (ERP) based on the absence or presence of selected sequence features associated with Alu retrotransposition capability. Our analyses indicate that most of the pol III-bound Alu loci candidates identified lack the sequence characteristics important for retrotransposition. These data suggest that Alu expression likely varies by cell type, growth conditions and transformation state. This variation could extend to where the same cell lines in different laboratories present different Alu expression patterns. The vast majority of Alu loci potentially transcribed by RNA pol III lack important sequence features for retrotransposition and the majority of potentially active Alu loci in the genome (scored high ERP) belong to young Alu subfamilies. Our observations suggest that in an in vivo scenario, the contribution of Alu activity on somatic genetic damage may significantly vary between individuals and tissues.
Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains

PubMed Central

Williams, Robert W; Xue, Bin; Uversky, Vladimir N; Dunker, A Keith

2013-01-01

The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1–3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information. PMID:28516017
Fine Analysis of Genetic Diversity of the tpr Gene Family among Treponemal Species, Subspecies and Strains

PubMed Central

Centurion-Lara, Arturo; Giacani, Lorenzo; Godornes, Charmie; Molini, Barbara J.; Brinck Reid, Tara; Lukehart, Sheila A.

2013-01-01

Background The pathogenic non-cultivable treponemes include three subspecies of Treponema pallidum (pallidum, pertenue, endemicum), T. carateum, T. paraluiscuniculi, and the unclassified Fribourg-Blanc treponeme (Simian isolate). These treponemes are morphologically indistinguishable and antigenically and genetically highly similar, yet cross-immunity is variable or non-existent. Although all of these organisms cause chronic, multistage skin and systemic disease, they have historically been classified by mode of transmission, clinical presentations and host ranges. Whole genome studies underscore the high degree of sequence identity among species, subspecies and strains, pinpointing a limited number of genomic regions for variation. Many of these “hot spots” include members of the tpr gene family, composed of 12 paralogs encoding candidate virulence factors. We hypothesize that the distinct clinical presentations, host specificity, and variable cross-immunity might reside on virulence factors such as the tpr genes. Methodology/Principal Findings Sequence analysis of 11 tpr loci (excluding tprK) from 12 strains demonstrated an impressive heterogeneity, including SNPs, indels, chimeric genes, truncated gene products and large deletions. Comparative analyses of sequences and 3D models of predicted proteins in Subfamily I highlight the striking co-localization of discrete variable regions with predicted surface-exposed loops. A hallmark of Subfamily II is the presence of chimeric genes in the tprG and J loci. Diversity in Subfamily III is limited to tprA and tprL. Conclusions/Significance An impressive sequence variability was found in tpr sequences among the Treponema isolates examined in this study, with most of the variation being consistent within subspecies or species, or between syphilis vs. non-syphilis strains. Variability was seen in the pallidum subspecies, which can be divided into 5 genogroups. These findings support a genetic basis for the classification of these organisms into their respective subspecies and species. Future functional studies will determine whether the identified genetic differences relate to cross-immunity, clinical differences, or host ranges. PMID:23696912
The nuclear matrix protein NMP-1 is the transcription factor YY1.

PubMed Central

Guo, B; Odgren, P R; van Wijnen, A J; Last, T J; Nickerson, J; Penman, S; Lian, J B; Stein, J L; Stein, G S

1995-01-01

NMP-1 was initially identified as a nuclear matrix-associated DNA-binding factor that exhibits sequence-specific recognition for the site IV regulatory element of a histone H4 gene. This distal promoter domain is a nuclear matrix interaction site. In the present study, we show that NMP-1 is the multifunctional transcription factor YY1. Gel-shift and Western blot analyses demonstrate that NMP-1 is immunoreactive with YY1 antibody. Furthermore, purified YY1 protein specifically recognizes site IV and reconstitutes the NMP-1 complex. Western blot and gel-shift analyses indicate that YY1 is present within the nuclear matrix. In situ immunofluorescence studies show that a significant fraction of YY1 is localized in the nuclear matrix, principally but not exclusively associated with residual nucleoli. Our results confirm that NMP-1/YY1 is a ubiquitous protein that is present in both human cells and in rat osteosarcoma ROS 17/2.8 cells. The finding that NMP-1 is identical to YY1 suggests that this transcriptional regulator may mediate gene-matrix interactions. Our results are consistent with the concept that the nuclear matrix may functionally compartmentalize the eukaryotic nucleus to support regulation of gene expression. Images Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479833

Tumour necrosis factor-alpha (-308G/A) promoter polymorphism is associated with ulcerative colitis in Brazilian patients.

PubMed

Tavares, M; de Lima, C; Fernandes, W; Martinelli, V; de Lucena, M; Lima, F; Telles, A; Brandão, L; de Melo Júnior, M

2016-12-01

Inflammatory bowel disease consists of multifactorial diseases whose common manifestation is inflammation of the gastrointestinal tract and their pathogenesis remains unknown. This study aimed to analyse the gene polymorphisms in Brazilian patients with inflammatory bowel disease. A total of 101 patients diagnosed with inflammatory bowel disease were analysed for the tumour necrosis factor-alpha (-308 G/A; rs1800629) and interleukin-10 (-1082 G/A; rs1800896) gene polymorphisms. Genotyping was performed through polymerase chain reaction-sequence-specific primer, then fractionated on 2% agarose gel and visualized after staining by ethidium bromide. The anatomic-clinical form of Crohn's disease (CD) predominant was the inflammatory (32.75%), followed by fistulizing (29.31%) and 27.58% stricturing. As control group, a total of 136 healthy subjects, from the same geographical region, were enrolled. The statistical analyses were performed using R program. The frequency of the A allele at tumour necrosis factor-alpha was high in ulcerative colitis (UC) patients (51%) than in controls (22%; P > 0.01). No statistical difference was found with the genotypic and allelic frequencies of CD patients compared to controls (P = 0.54). The polymorphism -1082G/A of interleukin-10 was not statistical different between the diseases compared to controls. Tumour necrosis factor-alpha (TNF-α) (-308G/A) is associated with UC onset, suggesting that the presence of -308A allele could confer a relative risk of 3.62 more to develop UC in general population. Further studies, increasing the number of individuals, should be performed to ratify the role of TNF-α in the inflammatory bowel disease pathogenesis. © 2016 John Wiley & Sons Ltd.
Deep Sequencing of the Medicago truncatula Root Transcriptome Reveals a Massive and Early Interaction between Nodulation Factor and Ethylene Signals.

PubMed

Larrainzar, Estíbaliz; Riely, Brendan K; Kim, Sang Cheol; Carrasquilla-Garcia, Noelia; Yu, Hee-Ju; Hwang, Hyun-Ju; Oh, Mijin; Kim, Goon Bo; Surendrarao, Anandkumar K; Chasman, Deborah; Siahpirani, Alireza F; Penmetsa, Ramachandra V; Lee, Gang-Seob; Kim, Namshin; Roy, Sushmita; Mun, Jeong-Hwan; Cook, Douglas R

2015-09-01

The legume-rhizobium symbiosis is initiated through the activation of the Nodulation (Nod) factor-signaling cascade, leading to a rapid reprogramming of host cell developmental pathways. In this work, we combine transcriptome sequencing with molecular genetics and network analysis to quantify and categorize the transcriptional changes occurring in roots of Medicago truncatula from minutes to days after inoculation with Sinorhizobium medicae. To identify the nature of the inductive and regulatory cues, we employed mutants with absent or decreased Nod factor sensitivities (i.e. Nodulation factor perception and Lysine motif domain-containing receptor-like kinase3, respectively) and an ethylene (ET)-insensitive, Nod factor-hypersensitive mutant (sickle). This unique data set encompasses nine time points, allowing observation of the symbiotic regulation of diverse biological processes with high temporal resolution. Among the many outputs of the study is the early Nod factor-induced, ET-regulated expression of ET signaling and biosynthesis genes. Coupled with the observation of massive transcriptional derepression in the ET-insensitive background, these results suggest that Nod factor signaling activates ET production to attenuate its own signal. Promoter:β-glucuronidase fusions report ET biosynthesis both in root hairs responding to rhizobium as well as in meristematic tissue during nodule organogenesis and growth, indicating that ET signaling functions at multiple developmental stages during symbiosis. In addition, we identified thousands of novel candidate genes undergoing Nod factor-dependent, ET-regulated expression. We leveraged the power of this large data set to model Nod factor- and ET-regulated signaling networks using MERLIN, a regulatory network inference algorithm. These analyses predict key nodes regulating the biological process impacted by Nod factor perception. We have made these results available to the research community through a searchable online resource. © 2015 American Society of Plant Biologists. All Rights Reserved.
Molecular phylogeny of Rigidoporus microporus isolates associated with white rot disease of rubber trees (Hevea brasiliensis).

PubMed

Oghenekaro, Abbot O; Miettinen, Otto; Omorusi, Victor I; Evueh, Grace A; Farid, Mohd A; Gazis, Romina; Asiegbu, Fred O

2014-01-01

Rigidoporus microporus (Polyporales, Basidiomycota) syn. Rigidoporus lignosus is the most destructive root pathogen of rubber plantations distributed in tropical and sub-tropical regions. Our primary objective was to characterize Nigerian isolates from rubber tree and compare them with other West African, Southeast Asian and American isolates. To characterize the 20 isolates from Nigeria, we used sequence data of the nuclear ribosomal DNA ITS and LSU, β-tubulin and translation elongation factor 1-α (tef1) gene sequences. Altogether, 40 isolates of R. microporus were included in the analyses. Isolates from Africa, Asia and South/Central America formed three distinctive clades corresponding to at least three species. No phylogeographic pattern was detected among R. microporus collected from West and Central African rubber plantations suggesting continuous gene flow among these populations. Our molecular phylogenetic analysis suggests the presence of two distinctive species associated with the white rot disease. Phylogenetic analyses placed R. microporus in the Hymenochaetales in the vicinity of Oxyporus. This is the first study to characterize R. microporus isolates from Nigeria through molecular phylogenetic techniques, and also the first to compare isolates from rubber plantations in Africa and Asia. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Association of the gut microbiota mobilome with hospital location and birth weight in preterm infants.

PubMed

Ravi, Anuradha; Estensmo, Eva Lena F; Abée-Lund, Trine M L'; Foley, Steven L; Allgaier, Bernhard; Martin, Camilia R; Claud, Erika C; Rudi, Knut

2017-11-01

BackgroundThe preterm infant gut microbiota is vulnerable to different biotic and abiotic factors. Although the development of this microbiota has been extensively studied, the mobilome-i.e. the mobile genetic elements (MGEs) in the gut microbiota-has not been considered. Therefore, the aim of this study was to investigate the association of the mobilome with birth weight and hospital location in the preterm infant gut microbiota.MethodsThe data set consists of fecal samples from 62 preterm infants with and without necrotizing enterocolitis (NEC) from three different hospitals. We analyzed the gut microbiome by using 16S rRNA amplicon sequencing, shot-gun metagenome sequencing, and quantitative PCR. Predictive models and other data analyses were performed using MATLAB and QIIME.ResultSThe microbiota composition was significantly different between NEC-positive and NEC-negative infants and significantly different between hospitals. An operational taxanomic unit (OTU) showed strong positive and negative correlation with NEC and birth weight, respectively, whereas none showed significance for mode of delivery. Metagenome analyses revealed high levels of conjugative plasmids with MGEs and virulence genes. Results from quantitative PCR showed that the plasmid signature genes were significantly different between hospitals and in NEC-positive infants.ConclusionOur results point toward an association of the mobilome with hospital location in preterm infants.
Reconsideration of systematic relationships within the order Euplotida (Protista, Ciliophora) using new sequences of the gene coding for small-subunit rRNA and testing the use of combined data sets to construct phylogenies of the Diophrys-complex.

PubMed

Yi, Zhenzhen; Song, Weibo; Clamp, John C; Chen, Zigui; Gao, Shan; Zhang, Qianqian

2009-03-01

Comprehensive molecular analyses of phylogenetic relationships within euplotid ciliates are relatively rare, and the relationships among some families remain questionable. We performed phylogenetic analyses of the order Euplotida based on new sequences of the gene coding for small-subunit RNA (SSrRNA) from a variety of taxa across the entire order as well as sequences from some of these taxa of other genes (ITS1-5.8S-ITS2 region and histone H4) that have not been included in previous analyses. Phylogenetic trees based on SSrRNA gene sequences constructed with four different methods had a consistent branching pattern that included the following features: (1) the "typical" euplotids comprised a paraphyletic assemblage composed of two divergent clades (family Uronychiidae and families Euplotidae-Certesiidae-Aspidiscidae-Gastrocirrhidae), (2) in the family Uronychiidae, the genera Uronychia and Paradiophrys formed a clearly outlined, well-supported clade that seemed to be rather divergent from Diophrys and Diophryopsis, suggesting that the Diophrys-complex may have had a longer and more separate evolutionary history than previously supposed, (3) inclusion of 12 new SSrRNA sequences in analyses of Euplotidae revealed two new clades of species within the family and cast additional doubt on the present classification of genera within the family, and (4) the intraspecific divergence among five species of Aspidisca was far greater than those of closely related genera. The ITS1-5.8S-ITS2 coding regions and partial histone H4 genes of six morphospecies in the Diophrys-complex were sequenced along with their SSrRNA genes and used to compare phylogenies constructed from single data sets to those constructed from combined sets. Results indicated that combined analyses could be used to construct more reliable, less ambiguous phylogenies of complex groups like the order Euplotida, because they provide a greater amount and diversity of information.
High-throughput sequencing identification and characterization of potentially adhesion-related small RNAs in Streptococcus mutans.

PubMed

Zhu, Wenhui; Liu, Shanshan; Liu, Jia; Zhou, Yan; Lin, Huancai

2018-05-01

Adherence capacity is one of the principal virulence factors of Streptococcus mutans, and adhesion virulence factors are controlled by small RNAs (sRNAs) at the post-transcriptional level in various bacteria. Here, we aimed to identify and decipher putative adhesion-related sRNAs in clinical strains of S. mutans. RNA deep-sequencing was performed to identify potential sRNAs under different adhesion conditions. The expression of sRNAs was analysed by quantitative real-time PCR (qRT-PCR), and bioinformatic methods were used to predict the functional characteristics of sRNAs. A total of 736 differentially expressed candidate sRNAs were predicted, and these included 352 sRNAs located on the antisense to mRNA (AM) and 384 sRNAs in intergenic regions (IGRs). The top 7 differentially expressed sRNAs were successfully validated by qRT-PCR in UA159, and 2 of these were further confirmed in 100 clinical isolates. Moreover, the sequences of two sRNAs were conserved in other Streptococcus species, indicating a conserved role in such closely related species. A good correlation between the expression of sRNAs and the adhesion of 100 clinical strains was observed, which, combined with GO and KEGG, provides a perspective for the comprehension of sRNA function annotation. This study revealed a multitude of novel putative adhesion-related sRNAs in S. mutans and contributed to a better understanding of information concerning the transcriptional regulation of adhesion in S. mutans.
Molecular population dynamics of DNA structures in a bcl-2 promoter sequence is regulated by small molecules and the transcription factor hnRNP LL.

PubMed

Cui, Yunxi; Koirala, Deepak; Kang, HyunJin; Dhakal, Soma; Yangyuoru, Philip; Hurley, Laurence H; Mao, Hanbin

2014-05-01

Minute difference in free energy change of unfolding among structures in an oligonucleotide sequence can lead to a complex population equilibrium, which is rather challenging for ensemble techniques to decipher. Herein, we introduce a new method, molecular population dynamics (MPD), to describe the intricate equilibrium among non-B deoxyribonucleic acid (DNA) structures. Using mechanical unfolding in laser tweezers, we identified six DNA species in a cytosine (C)-rich bcl-2 promoter sequence. Population patterns of these species with and without a small molecule (IMC-76 or IMC-48) or the transcription factor hnRNP LL are compared to reveal the MPD of different species. With a pattern recognition algorithm, we found that IMC-48 and hnRNP LL share 80% similarity in stabilizing i-motifs with 60 s incubation. In contrast, IMC-76 demonstrates an opposite behavior, preferring flexible DNA hairpins. With 120-180 s incubation, IMC-48 and hnRNP LL destabilize i-motifs, which has been previously proposed to activate bcl-2 transcriptions. These results provide strong support, from the population equilibrium perspective, that small molecules and hnRNP LL can modulate bcl-2 transcription through interaction with i-motifs. The excellent agreement with biochemical results firmly validates the MPD analyses, which, we expect, can be widely applicable to investigate complex equilibrium of biomacromolecules. © 2014 The Author(s). Published by Oxford University Press [on behalf of Nucleic Acids Research].
Comparison of the aggregation of homologous β2-microglobulin variants reveals protein solubility as a key determinant of amyloid formation

PubMed Central

Pashley, Clare L.; Hewitt, Eric W.; Radford, Sheena E.

2016-01-01

The mouse and human β2-microglobulin protein orthologs are 70 % identical in sequence and share 88 % sequence similarity. These proteins are predicted by various algorithms to have similar aggregation and amyloid propensities. However, whilst human β2m (hβ2m) forms amyloid-like fibrils in denaturing conditions (e.g. pH 2.5) in the absence of NaCl, mouse β2m (mβ2m) requires the addition of 0.3 M NaCl to cause fibrillation. Here, the factors which give rise to this difference in amyloid propensity are investigated. We utilise structural and mutational analyses, fibril growth kinetics and solubility measurements under a range of pH and salt conditions, to determine why these two proteins have different amyloid propensities. The results show that, although other factors influence the fibril growth kinetics, a striking difference in the solubility of the proteins is a key determinant of the different amyloidogenicity of hβ2m and mβ2m. The relationship between protein solubility and lag time of amyloid formation is not captured by current aggregation or amyloid prediction algorithms, indicating a need to better understand the role of solubility on the lag time of amyloid formation. The results demonstrate the key contribution of protein solubility in determining amyloid propensity and lag time of amyloid formation, highlighting how small differences in protein sequence can have dramatic effects on amyloid formation. PMID:26780548
The mass spectral density in quantitative time-of-flight mass spectrometry of polymers

NASA Astrophysics Data System (ADS)

Tate, Ranjeet S.; Ebeling, Dan; Smith, Lloyd M.

2001-03-01

Time-of-flight mass spectrometry (TOF-MS) is being increasingly used for the study of polymers, for example to obtain the distribution of molecular masses for polymer samples. Serious efforts have also been underway to use TOF-MS for DNA sequencing. In TOF-MS the data is obtained in the form of a time-series that represents the distribution in arrival times of ions of various m/z ratios. This time-series data is then converted to a "mass-spectrum" via a coordinate transformation from the arrival time (t) to the corresponding mass-to-charge ratio (m/z = const. t^2). In this transformation, it is important to keep in mind that spectra are distributions, or densities of weight +1, and thus do not transform as functions. To obtain the mass-spectral density, it is necessary to include a multiplicative factor of √m/z. Common commercial instruments do not take this factor into account. Dropping this factor has no effect on qualitative analysis (detection) or local quantitative measurements, since S/N or signal-to-baseline ratios are unaffected for peaks with small dispersions. However, there are serious consequences for general quantitative analyses. In DNA sequencing applications, loss of signal intensity is in part attributed to multiple charging; however, since the √m/z factor is not taken into account, this conclusion is based on an overestimate (by a factor of √z) of the relative amount of the multiply charged species. In the study of polymers, the normalized dispersion is underestimated by approximately (M_w/Mn -1)/2. In terms of M_w/Mn itself, for example, a M_w/M_n=1.5 calculated without the √m factor corresponds in fact to a M_w/M_n=1.88.
The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes.

PubMed

Sanitá Lima, Matheus; Woods, Laura C; Cartwright, Matthew W; Smith, David Roy

2016-11-01

Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Exploring New Zealand prescription data using sequence symmetry analyses for predicting adverse drug reactions.

PubMed

Nishtala, P S; Chyou, T-Y

2017-04-01

Prescription sequence symmetry analyses (PSSA) is a ubiquitous tool employed in pharmacoepidemiological research to predict adverse drug reactions (ADRs). Several studies have reported the advantage of PSSA as a method that can be applied to a large prescription database with computational ease. The objective of this study was to validate New Zealand (NZ) prescription database as a potential source for identifying ADRs using the PSSA method. We analysed de-identified individual-level prescription data for people aged 65 years and above for the period 2005 to 2014 from the pharmaceutical collections supplied by the NZ Ministry of Health. We selected six positive controls that have been previously investigated and reported for causing ADRs. The six positive controls identified were amiodarone (repeated twice), frusemide, simvastatin, lithium and fluticasone. Amiodarone and lithium have been reported to induce thyroid dysfunction. Simvastatin reported to cause muscle cramps while fluticasone is well documented to cause oral candidiasis. Thyroxine was identified as a marker drug to treat hypothyroidism associated with amiodarone and lithium. Carbimazole was identified as a marker drug to treat hyperthyroidism associated with amiodarone use. Quinine sulphate was identified as a marker drug to treat muscle cramps associated with statins. In addition, we also analysed six negative controls that are unlikely to be associated with ADRs. The main outcome measure is to determine associations with ADRs using adjusted sequence ratios (ASR), and 95% confidence intervals RESULTS AND DISCUSSION: Our analyses confirmed a significant signal for all six positive controls. Significant positive associations were noted for amiodarone [ASR = 3·57, 95% CI (3·17-4·02)], and lithium chloride induced hypothyroidism [ASR = 3·43, 95% CI (2·55-4·70)]. Amiodarone was also strongly associated with hyperthyroidism [ASR = 8·81 95% CI (5·86-13·77)]. Simvastatin was associated with muscle cramps [ASR = 1·69, 95% CI (1·61-1·77)]. Fluticasone was positively associated with oral candidiasis [ASR = 2·34, 95% CI (2·19-2·50)]. Frusemide was associated with hypokalaemia [ASR = 2·94, 95% CI (2·83-3·05]). No strong associations were noted for the negative pairs. It is important to highlight that PSSA automatically controls for all confounding factors including unknown and unmeasured confounding variables, plus the effect of temporal trend in prescriptions, and hence allows a more robust ADR detection especially when confounding factors are difficult to determine or measure. New Zealand prescription database can be a potential source to identify ADRs engaging the PSSA method, and this could complement pharmacovigilance surveillance in NZ. The PSSA can be an important method for post-marketing surveillance and monitoring of ADRs which have relatively short latency. However, the predictive validity of PSSA will be compromised in certain scenarios, particularly when sample size is small, when new drugs are in the market and data are sparse. © 2016 John Wiley & Sons Ltd.
Characterizing the avian gut microbiota: membership, driving influences, and potential function.

PubMed

Waite, David W; Taylor, Michael W

2014-01-01

Birds represent a diverse and evolutionarily successful lineage, occupying a wide range of niches throughout the world. Like all vertebrates, avians harbor diverse communities of microorganisms within their guts, which collectively fulfill important roles in providing the host with nutrition and protection from pathogens. Although many studies have investigated the role of particular microbes in the guts of avian species, there has been no attempt to unify the results of previous, sequence-based studies to examine the factors that shape the avian gut microbiota as a whole. In this study, we present the first meta-analysis of the avian gut microbiota, using 16S rRNA gene sequences obtained from a range of publicly available clone-library and amplicon pyrosequencing data. We investigate community membership and structure, as well as probe the roles of some of the key biological factors that influence the gut microbiota of other vertebrates, such as host phylogeny, location within the gut, diet, and association with humans. Our results indicate that, across avian studies, the microbiota demonstrates a similar phylum-level composition to that of mammals. Host bird species is the most important factor in determining community composition, although sampling site, diet, and captivity status also contribute. These analyses provide a first integrated look at the composition of the avian microbiota, and serve as a foundation for future studies in this area.
Network analysis reveals seasonal variation of co-occurrence correlations between Cyanobacteria and other bacterioplankton.

PubMed

Zhao, Dayong; Shen, Feng; Zeng, Jin; Huang, Rui; Yu, Zhongbo; Wu, Qinglong L

2016-12-15

Association network approaches have recently been proposed as a means for exploring the associations between bacterial communities. In the present study, high-throughput sequencing was employed to investigate the seasonal variations in the composition of bacterioplankton communities in six eutrophic urban lakes of Nanjing City, China. Over 150,000 16S rRNA sequences were derived from 52 water samples, and correlation-based network analyses were conducted. Our results demonstrated that the architecture of the co-occurrence networks varied in different seasons. Cyanobacteria played various roles in the ecological networks during different seasons. Co-occurrence patterns revealed that members of Cyanobacteria shared a very similar niche and they had weak positive correlations with other phyla in summer. To explore the effect of environmental factors on species-species co-occurrence networks and to determine the most influential environmental factors, the original positive network was simplified by module partitioning and by calculating module eigengenes. Module eigengene analysis indicated that temperature only affected some Cyanobacteria; the rest were mainly affected by nitrogen associated factors throughout the year. Cyanobacteria were dominant in summer which may result from strong co-occurrence patterns and suitable living conditions. Overall, this study has improved our understanding of the roles of Cyanobacteria and other bacterioplankton in ecological networks. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Large deletions play a minor but essential role in congenital coagulation factor VII and X deficiencies.

PubMed

Rath, M; Najm, J; Sirb, H; Kentouche, K; Dufke, A; Pauli, S; Hackmann, K; Liehr, T; Hübner, C A; Felbor, U

2015-01-01

Congenital factor VII (FVII) and factor X (FX) deficiencies belong to the group of rare bleeding disorders which may occur in separate or combined forms since both the F7 and F10 genes are located in close proximity on the distal long arm of chromosome 13 (13q34). We here present data of 192 consecutive index cases with FVII and/or FX deficiency. 10 novel and 53 recurrent sequence alterations were identified in the F7 gene and 5 novel as well as 11 recurrent in the F10 gene including one homozygous 4.35 kb deletion within F7 (c.64+430_131-6delinsTCGTAA) and three large heterozygous deletions involving both the F7 and F10 genes. One of the latter proved to be cytogenetically visible as a chromosome 13q34 deletion and associated with agenesis of the corpus callosum and psychomotor retardation. Large deletions play a minor but essential role in the mutational spectrum of the F7 and F10 genes. Copy number analyses (e. g. MLPA) should be considered if sequencing cannot clarify the underlying reason of an observed coagulopathy. Of note, in cases of combined FVII/FX deficiency, a deletion of the two contiguous genes might be part of a larger chromosomal rearrangement.
Characterizing the avian gut microbiota: membership, driving influences, and potential function

PubMed Central

Waite, David W.; Taylor, Michael W.

2014-01-01

Birds represent a diverse and evolutionarily successful lineage, occupying a wide range of niches throughout the world. Like all vertebrates, avians harbor diverse communities of microorganisms within their guts, which collectively fulfill important roles in providing the host with nutrition and protection from pathogens. Although many studies have investigated the role of particular microbes in the guts of avian species, there has been no attempt to unify the results of previous, sequence-based studies to examine the factors that shape the avian gut microbiota as a whole. In this study, we present the first meta-analysis of the avian gut microbiota, using 16S rRNA gene sequences obtained from a range of publicly available clone-library and amplicon pyrosequencing data. We investigate community membership and structure, as well as probe the roles of some of the key biological factors that influence the gut microbiota of other vertebrates, such as host phylogeny, location within the gut, diet, and association with humans. Our results indicate that, across avian studies, the microbiota demonstrates a similar phylum-level composition to that of mammals. Host bird species is the most important factor in determining community composition, although sampling site, diet, and captivity status also contribute. These analyses provide a first integrated look at the composition of the avian microbiota, and serve as a foundation for future studies in this area. PMID:24904538
Whipworms in humans and pigs: origins and demography.

PubMed

Hawash, Mohamed B F; Betson, Martha; Al-Jubury, Azmi; Ketzis, Jennifer; LeeWillingham, Arve; Bertelsen, Mads F; Cooper, Philip J; Littlewood, D Tim J; Zhu, Xing-Quan; Nejsum, Peter

2016-01-22

Trichuris suis and T. trichiura are two different whipworm species that infect pigs and humans, respectively. T. suis is found in pigs worldwide while T. trichiura is responsible for nearly 460 million infections in people, mainly in areas of poor sanitation in tropical and subtropical areas. The evolutionary relationship and the historical factors responsible for this worldwide distribution are poorly understood. In this study, we aimed to reconstruct the demographic history of Trichuris in humans and pigs, the evolutionary origin of Trichuris in these hosts and factors responsible for parasite dispersal globally. Parts of the mitochondrial nad1 and rrnL genes were sequenced followed by population genetic and phylogenetic analyses. Populations of Trichuris examined were recovered from humans (n = 31), pigs (n = 58) and non-human primates (n = 49) in different countries on different continents, namely Denmark, USA, Uganda, Ecuador, China and St. Kitts (Caribbean). Additional sequences available from GenBank were incorporated into the analyses. We found no differentiation between human-derived Trichuris in Uganda and the majority of the Trichuris samples from non-human primates suggesting a common African origin of the parasite, which then was transmitted to Asia and further to South America. On the other hand, there was no differentiation between pig-derived Trichuris from Europe and the New World suggesting dispersal relates to human activities by transporting pigs and their parasites through colonisation and trade. Evidence for recent pig transport from China to Ecuador and from Europe to Uganda was also observed from their parasites. In contrast, there was high genetic differentiation between the pig Trichuris in Denmark and China in concordance with the host genetics. We found evidence for an African origin of T. trichiura which were then transmitted with human ancestors to Asia and further to South America. A host shift to pigs may have occurred in Asia from where T. suis seems to have been transmitted globally by a combination of natural host dispersal and anthropogenic factors.
A population genetics analysis in clinical isolates of Sporothrix schenckii based on calmodulin and calcium/calmodulin-dependent kinase partial gene sequences.

PubMed

Rangel-Gamboa, Lucia; Martinez-Hernandez, Fernando; Maravilla, Pablo; Flisser, Ana

2018-02-02

Sporotrichosis is a subcutaneous mycosis that is caused by diverse species of Sporothrix. High levels of genetic diversity in Sporothrix isolates have been reported, but few population genetics analyses have been documented. To analyse the genetic variability and population genetics relations of Sporothrix schenckii Mexican clinical isolates and to compare them with other reported isolates. We studied the partial sequences of calmodulin and calcium/calmodulin-dependent kinase genes in 24 isolates; 22 from Mexico, one from Colombia, and one ATCC ® 6331™; the latter was used as a positive control. In total, 24 isolates were analysed. Phylogenetic, haplotype and population genetic analyses were performed with 24 sequences obtained by us and 345 sequences obtained from GenBank. The frequency of S. schenckii sensu stricto was 81% in the 22 Mexican isolates, while the remaining 19% were Sporothrix globosa. Mexican S. schenckii sensu stricto had high genetic diversity and was related to isolates from South America. In contrast, S. globosa showed one haplotype related to isolates from Asia, Brazil, Spain and the USA. In S. schenckii sensu stricto, S. brasiliensis and S. globosa, haplotype polymorphism (θ) values were higher than the nucleotide diversity data (π). In addition, Tajima's D plus Fu and Li's tests analyses displayed negative values, suggesting directional selection and arguing against the model of neutral evolution in these populations. In addition, analyses showed that calcium/calmodulin-dependent kinase was a suitable genetic marker to discriminate between common Sporothrix species. © 2018 Blackwell Verlag GmbH.
GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

PubMed Central

Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

2013-01-01

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070
Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing

PubMed Central

Bohmann, Kristine; Monadjem, Ara; Lehmkuhl Noer, Christina; Rasmussen, Morten; Zeale, Matt R. K.; Clare, Elizabeth; Jones, Gareth; Willerslev, Eske; Gilbert, M. Thomas P.

2011-01-01

Given the diversity of prey consumed by insectivorous bats, it is difficult to discern the composition of their diet using morphological or conventional PCR-based analyses of their faeces. We demonstrate the use of a powerful alternate tool, the use of the Roche FLX sequencing platform to deep-sequence uniquely 5′ tagged insect-generic barcode cytochrome c oxidase I (COI) fragments, that were PCR amplified from faecal pellets of two free-tailed bat species Chaerephon pumilus and Mops condylurus (family: Molossidae). Although the analyses were challenged by the paucity of southern African insect COI sequences in the GenBank and BOLD databases, similarity to existing collections allowed the preliminary identification of 25 prey families from six orders of insects within the diet of C. pumilus, and 24 families from seven orders within the diet of M. condylurus. Insects identified to families within the orders Lepidoptera and Diptera were widely present among the faecal samples analysed. The two families that were observed most frequently were Noctuidae and Nymphalidae (Lepidoptera). Species-level analysis of the data was accomplished using novel bioinformatics techniques for the identification of molecular operational taxonomic units (MOTU). Based on these analyses, our data provide little evidence of resource partitioning between sympatric M. condylurus and C. pumilus in the Simunye region of Swaziland at the time of year when the samples were collected, although as more complete databases against which to compare the sequences are generated this may have to be re-evaluated. PMID:21731749
Demonstrating Interactions of Transcription Factors with DNA by Electrophoretic Mobility Shift Assay.

PubMed

Yousaf, Nasim; Gould, David

2017-01-01

Confirming the binding of a transcription factor with a particular DNA sequence may be important in characterizing interactions with a synthetic promoter. Electrophoretic mobility shift assay is a powerful approach to demonstrate the specific DNA sequence that is bound by a transcription factor and also to confirm the specific transcription factor involved in the interaction. In this chapter we describe a method we have successfully used to demonstrate interactions of endogenous transcription factors with sequences derived from endogenous and synthetic promoters.

The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

PubMed

Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

2013-06-03

Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.
Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

PubMed Central

2013-01-01

Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255
What caused the outbreak of ESBL-producing Klebsiella pneumoniae in a neonatal intensive care unit, Germany 2009 to 2012? Reconstructing transmission with epidemiological analysis and whole-genome sequencing

PubMed Central

Haller, Sebastian; Eller, Christoph; Hermes, Julia; Kaase, Martin; Steglich, Matthias; Radonić, Aleksandar; Dabrowski, Piotr Wojtek; Nitsche, Andreas; Pfeifer, Yvonne; Werner, Guido; Wunderle, Werner; Velasco, Edward; Abu Sin, Muna; Eckmanns, Tim; Nübel, Ulrich

2015-01-01

Objective We aimed to retrospectively reconstruct the timing of transmission events and pathways in order to understand why extensive preventive measures and investigations were not sufficient to prevent new cases. Methods We extracted available information from patient charts to describe cases and to compare them to the normal population of the ward. We conducted a cohort study to identify risk factors for pathogen acquisition. We sequenced the available isolates to determine the phylogenetic relatedness of Klebsiella pneumoniae isolates on the basis of their genome sequences. Results The investigation comprises 37 cases and the 10 cases with ESBL (extended-spectrum beta-lactamase)-producing K. pneumoniae bloodstream infection. Descriptive epidemiology indicated that a continuous transmission from person to person was most likely. Results from the cohort study showed that ‘frequent manipulation’ (a proxy for increased exposure to medical procedures) was significantly associated with being a case (RR 1.44, 95% CI 1.02 to 2.19). Genome sequences revealed that all 48 bacterial isolates available for sequencing from 31 cases were closely related (maximum genetic distance, 12 single nucleotide polymorphisms). Based on our calculation of evolutionary rate and sequence diversity, we estimate that the outbreak strain was endemic since 2008. Conclusions Epidemiological and phylogenetic analyses consistently indicated that there were additional, undiscovered cases prior to the onset of microbiological screening and that the spread of the pathogen remained undetected over several years, driven predominantly by person-to-person transmission. Whole-genome sequencing provided valuable information on the onset, course and size of the outbreak, and on possible ways of transmission. PMID:25967999
Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

PubMed Central

2010-01-01

Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

PubMed Central

2013-01-01

Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509
Monitoring and Surveillance of Marine Invasive Species in Californian Waters by DNA Barcoding: Methodological and Analytical Solutions

NASA Astrophysics Data System (ADS)

Campbell, T. L.; Geller, J. B.; Heller, P.; Ruiz, G.; Chang, A.; McCann, L.; Ceballos, L.; Marraffini, M.; Ashton, G.; Larson, K.; Havard, S.; Meagher, K.; Wheelock, M.; Drake, C.; Rhett, G.

2016-02-01

The Ballast Water Management Act, the Marine Invasive Species Act, and the Coastal Ecosystem Protection Act require the California Department of Fish and Wildlife to monitor and evaluate the extent of biological invasions in the state's marine and estuarine waters. This has been performed statewide, using a variety of methodologies. Conventional sample collection and processing is laborious, slow and costly, and may require considerable taxonomic expertise requiring detailed time-consuming microscopic study of multiple specimens. These factors limit the volume of biomass that can be searched for introduced species. New technologies continue to reduce the cost and increase the throughput of genetic analyses, which become efficient alternatives to traditional morphological analysis for identification, monitoring and surveillance of marine invasive species. Using next-generation sequencing of mitochondrial Cytochrome c oxidase subunit I (COI) and nuclear large subunit ribosomal RNA (LSU), we analyzed over 15,000 individual marine invertebrates collected in Californian waters. We have created sequence databases of California native and non-native species to assist in molecular identification and surveillance in North American waters. Metagenetics, the next-generation sequencing of environmental samples with comparison to DNA sequence databases, is a faster and cost-effective alternative to individual sample analysis. We have sequenced from biomass collected from whole settlement plates and plankton in California harbors, and used our introduced species database to create species lists. We can combine these species lists for individual marinas with collected environmental data, such as temperature, salinity, and dissolved oxygen to understand the ecology of marine invasions. Here we discuss high throughput sampling, sequencing, and COASTLINE, our data analysis answer to challenges working with hundreds of millions of sequencing reads from tens of thousands of specimens.
Comparing the loss of functional independence of older adults in the U.S. and China.

PubMed

Fong, Joelle H; Feng, Jun

2018-01-01

Functional loss among older adults is known to follow a hierarchical sequence, but little is known about whether such sequences differ across socio-cultural contexts. The aim of this study is to construct activities of daily livings (ADL) scales for oldest-old adults in the United States and China so as to compare their functional loss sequences. We use data from the Asset and Health Dynamics of the Oldest Old (n=1607) and Chinese Longitudinal Healthy Longevity Survey (n=5570) for years 1998-2008. ADL items are calibrated within a scale using the Rasch measurement model. Rasch scores are averaged across survey waves to identify the ADL loss sequence for each study population. We also assess scale stability over measurement periods. Factor analyses confirm that the ADL items in each study population can be combined meaningfully to form a hierarchical sequence. Internal consistency assessed by Cronbach's alpha is high (0.81 to 0.95). We find that bathing is the first activity that both older Americans and Chinese have difficulty with, while eating is the last activity. There are, however, differences in the rank order for toileting (ranked more challenging in the Chinese sample) and dressing (ranked more challenging in the U.S. sample). Item orderings are stable over time. The results highlight the relative importance of bathing in the functional loss sequence for older adults, regardless of socio-cultural context. Health interventions are needed to address deficits in the bathroom environment, especially in developing countries like China. Copyright © 2017 Elsevier B.V. All rights reserved.
Soil Parameters Drive the Structure, Diversity and Metabolic Potentials of the Bacterial Communities Across Temperate Beech Forest Soil Sequences.

PubMed

Jeanbille, M; Buée, M; Bach, C; Cébron, A; Frey-Klett, P; Turpault, M P; Uroz, S

2016-02-01

Soil and climatic conditions as well as land cover and land management have been shown to strongly impact the structure and diversity of the soil bacterial communities. Here, we addressed under a same land cover the potential effect of the edaphic parameters on the soil bacterial communities, excluding potential confounding factors as climate. To do this, we characterized two natural soil sequences occurring in the Montiers experimental site. Spatially distant soil samples were collected below Fagus sylvatica tree stands to assess the effect of soil sequences on the edaphic parameters, as well as the structure and diversity of the bacterial communities. Soil analyses revealed that the two soil sequences were characterized by higher pH and calcium and magnesium contents in the lower plots. Metabolic assays based on Biolog Ecoplates highlighted higher intensity and richness in usable carbon substrates in the lower plots than in the middle and upper plots, although no significant differences occurred in the abundance of bacterial and fungal communities along the soil sequences as assessed using quantitative PCR. Pyrosequencing analysis of 16S ribosomal RNA (rRNA) gene amplicons revealed that Proteobacteria, Acidobacteria and Bacteroidetes were the most abundantly represented phyla. Acidobacteria, Proteobacteria and Chlamydiae were significantly enriched in the most acidic and nutrient-poor soils compared to the Bacteroidetes, which were significantly enriched in the soils presenting the higher pH and nutrient contents. Interestingly, aluminium, nitrogen, calcium, nutrient availability and pH appeared to be the best predictors of the bacterial community structures along the soil sequences.
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

PubMed

Nishizawa, M; Nishizawa, K

2000-10-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.
Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

PubMed Central

Nishizawa, Manami; Nishizawa, Kazuhisa

2000-01-01

The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273
Characterization of occult hepatitis B virus infection among HIV positive patients in Cameroon.

PubMed

Gachara, George; Magoro, Tshifhiwa; Mavhandu, Lufuno; Lum, Emmaculate; Kimbi, Helen K; Ndip, Roland N; Bessong, Pascal O

2017-03-08

Occult hepatitis B infection (OBI) among HIV positive patients varies widely in different geographic regions. We undertook a study to determine the prevalence of occult hepatitis B infection among HIV infected individuals visiting a health facility in South West Cameroon and characterized occult HBV strains based on sequence analyses. Plasma samples (n = 337), which previously tested negative for hepatitis B surface antigen (HBsAg), were screened for antibodies against hepatitis B core (anti-HBc) and surface (anti-HBs) antigens followed by DNA extraction. A 366 bp region covering the overlapping surface/polymerase gene of HBV was then amplified in a nested PCR and the amplicons sequenced using Sanger sequencing. The resulting sequences were then analyzed for genotypes and for escape and drug resistance mutations. Twenty samples were HBV DNA positive and were classified as OBI giving a prevalence of 5.9%. Out of these, 9 (45%) were anti-HBs positive, while 10 (52.6%) were anti-HBc positive. Additionally, 2 had dual anti-HBs and anti-HBc reactivity, while 6 had no detectable HBV antibodies. Out of the ten samples that were successfully sequenced, nine were classified as genotype E and one as genotype A. Three sequences possessed mutations associated with lamivudine resistance. We detected a number of mutations within the major hydrophilic region of the surface gene where most immune escape mutations occur. Findings from this study show the presence of hepatitis B in patients without any of the HBV serological markers. Further prospective studies are required to determine the risk factors and markers of OBI.
Sequencing and comparative analyses of the genomes of zoysiagrasses

PubMed Central

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-01-01

Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196
Analyzing the relationship between sequence divergence and nodal support using Bayesian phylogenetic analyses.

PubMed

Makowsky, Robert; Cox, Christian L; Roelke, Corey; Chippindale, Paul T

2010-11-01

Determining the appropriate gene for phylogeny reconstruction can be a difficult process. Rapidly evolving genes tend to resolve recent relationships, but suffer from alignment issues and increased homoplasy among distantly related species. Conversely, slowly evolving genes generally perform best for deeper relationships, but lack sufficient variation to resolve recent relationships. We determine the relationship between sequence divergence and Bayesian phylogenetic reconstruction ability using both natural and simulated datasets. The natural data are based on 28 well-supported relationships within the subphylum Vertebrata. Sequences of 12 genes were acquired and Bayesian analyses were used to determine phylogenetic support for correct relationships. Simulated datasets were designed to determine whether an optimal range of sequence divergence exists across extreme phylogenetic conditions. Across all genes we found that an optimal range of divergence for resolving the correct relationships does exist, although this level of divergence expectedly depends on the distance metric. Simulated datasets show that an optimal range of sequence divergence exists across diverse topologies and models of evolution. We determine that a simple to measure property of genetic sequences (genetic distance) is related to phylogenic reconstruction ability in Bayesian analyses. This information should be useful for selecting the most informative gene to resolve any relationships, especially those that are difficult to resolve, as well as minimizing both cost and confounding information during project design. Copyright © 2010. Published by Elsevier Inc.
Assessing the diversity of AM fungi in arid gypsophilous plant communities.

PubMed

Alguacil, M M; Roldán, A; Torres, M P

2009-10-01

In the present study, we used PCR-Single-Stranded Conformation Polymorphism (SSCP) techniques to analyse arbuscular mycorrhizal fungi (AMF) communities in four sites within a 10 km(2) gypsum area in Southern Spain. Four common plant species from these ecosystems were selected. The AM fungal small-subunit (SSU) rRNA genes were subjected to PCR, cloning, SSCP analysis, sequencing and phylogenetic analyses. A total of 1443 SSU rRNA sequences were analysed, for 21 AM fungal types: 19 belonged to the genus Glomus, 1 to the genus Diversispora and 1 to the Scutellospora. Four sequence groups were identified, which showed high similarity to sequences of known glomalean species or isolates: Glo G18 to Glomus constrictum, Glo G1 to Glomus intraradices, Glo G16 to Glomus clarum, Scut to Scutellospora dipurpurescens and Div to one new genus in the family Diversisporaceae identified recently as Otospora bareai. There were three sequence groups that received strong support in the phylogenetic analysis, and did not seem to be related to any sequences of AM fungi in culture or previously found in the database; thus, they could be novel taxa within the genus Glomus: Glo G4, Glo G2 and Glo G14. We have detected the presence of both generalist and potential specialist AMF in gypsum ecosystems. The AMF communities were different in the plant studied suggesting some degree of preference in the interactions between these symbionts.
Sequencing and comparative analyses of the genomes of zoysiagrasses.

PubMed

Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

2016-04-01

Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella 'Wakaba' and Z. pacifica 'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica'Kyoto', Z. japonica'Miyagi' and Z. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' at http://zoysia.kazusa.or.jp. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
MaxAlign: maximizing usable data in an alignment.

PubMed

Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G

2007-08-28

The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.
DNA Sequences from Formalin-Fixed Nematodes: Integrating Molecular and Morphological Approaches to Taxonomy

PubMed Central

Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.

1997-01-01

To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
Cloning, sequencing and characterization of lipase genes from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

USDA-ARS?s Scientific Manuscript database

Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

PubMed Central

Laehnemann, David; Borkhardt, Arndt

2016-01-01

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Gene expression analysis of flax seed development

PubMed Central

2011-01-01

Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise even low-expressed genes such as those encoding transcription factors. This has allowed us to delineate the spatio-temporal aspects of gene expression underlying the biosynthesis of a number of important seed constituents in flax. Flax belongs to a taxonomic group of diverse plants and the large sequence database will allow for evolutionary studies as well. PMID:21529361

Discovery and information-theoretic characterization of transcription factor binding sites that act cooperatively.

PubMed

Clifford, Jacob; Adami, Christoph

2015-09-02

Transcription factor binding to the surface of DNA regulatory regions is one of the primary causes of regulating gene expression levels. A probabilistic approach to model protein-DNA interactions at the sequence level is through position weight matrices (PWMs) that estimate the joint probability of a DNA binding site sequence by assuming positional independence within the DNA sequence. Here we construct conditional PWMs that depend on the motif signatures in the flanking DNA sequence, by conditioning known binding site loci on the presence or absence of additional binding sites in the flanking sequence of each site's locus. Pooling known sites with similar flanking sequence patterns allows for the estimation of the conditional distribution function over the binding site sequences. We apply our model to the Dorsal transcription factor binding sites active in patterning the Dorsal-Ventral axis of Drosophila development. We find that those binding sites that cooperate with nearby Twist sites on average contain about 0.5 bits of information about the presence of Twist transcription factor binding sites in the flanking sequence. We also find that Dorsal binding site detectors conditioned on flanking sequence information make better predictions about what is a Dorsal site relative to background DNA than detection without information about flanking sequence features.
How to Choose the Suitable Template for Homology Modelling of GPCRs: 5-HT7 Receptor as a Test Case.

PubMed

Shahaf, Nir; Pappalardo, Matteo; Basile, Livia; Guccione, Salvatore; Rayan, Anwar

2016-09-01

G protein-coupled receptors (GPCRs) are a super-family of membrane proteins that attract great pharmaceutical interest due to their involvement in almost every physiological activity, including extracellular stimuli, neurotransmission, and hormone regulation. Currently, structural information on many GPCRs is mainly obtained by the techniques of computer modelling in general and by homology modelling in particular. Based on a quantitative analysis of eighteen antagonist-bound, resolved structures of rhodopsin family "A" receptors - also used as templates to build 153 homology models - it was concluded that a higher sequence identity between two receptors does not guarantee a lower RMSD between their structures, especially when their pair-wise sequence identity (within trans-membrane domain and/or in binding pocket) lies between 25 % and 40 %. This study suggests that we should consider all template receptors having a sequence identity ≤50 % with the query receptor. In fact, most of the GPCRs, compared to the currently available resolved structures of GPCRs, fall within this range and lack a correlation between structure and sequence. When testing suitability for structure-based drug design, it was found that choosing as a template the most similar resolved protein, based on sequence resemblance only, led to unsound results in many cases. Molecular docking analyses were carried out, and enrichment factors as well as attrition rates were utilized as criteria for assessing suitability for structure-based drug design. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Systematics of Hypocrea citrina and related taxa

PubMed Central

Overton, Barrie E.; Stewart, Elwin L.; Geiser, David M.; Jaklitsch, Walter M.

2006-01-01

Morphological studies and phylogenetic analyses of DNA sequences from three genomic regions – the internal transcribed spacer (ITS) regions of the nuclear ribosomal gene repeat, a partial sequence of RNA polymerase II subunit (rpb2), and a partial sequence of translation elongation factor (tef1) – were used to investigate the systematics of Hypocrea citrina and related species. A neotype specimen is designated for H. citrina that conforms to Persoon's description of a yellow effuse fungus occurring on leaf litter. Historical information and results obtained in this study provide the foundation for selection of a lectotype specimen from Fries's herbarium for H. lactea. The results indicate that (1) Hypocrea citrina and H. pulvinata are distinct species; (2) H. lactea sensu Fries is a synonym of the older name H. citrina; (3) H. pulvinata, H. protopulvinata, and H. americana are phylogenetically distinct species that form a well-supported polyporicolous clade; (4) H. citrina is situated in a clade closely related to H. pulvinata; and (5) H. microcitrina and H. pseudostraminea reside in a highly supported clade phylogenetically distinct from H. citrina. Hypocrea protopulvinata, H. microcitrina, H. megalocitrina, H. pseudostraminea, and a new species, H. aurantiistroma, are reported and described from North America. Variation in rpb2 and tef1 gene sequences suggests geographical subgroupings between European and North American isolates of H. pulvinata. The phylogenies inferred from ITS, rpb2, and tef1 gene sequences are concordant. Hypocrea citrina var. americana is elevated to species status, Hypocrea americana. PMID:18490988
De novo transcriptome analysis of an imminent biofuel crop, Camelina sativa L. using Illumina GAIIX sequencing platform and identification of SSR markers.

PubMed

Mudalkar, Shalini; Golla, Ramesh; Ghatty, Sreenivas; Reddy, Attipalli Ramachandra

2014-01-01

Camelina sativa L. is an emerging biofuel crop with potential applications in industry, medicine, cosmetics and human nutrition. The crop is unexploited owing to very limited availability of transcriptome and genomic data. In order to analyse the various metabolic pathways, we performed de novo assembly of the transcriptome on Illumina GAIIX platform with paired end sequencing for obtaining short reads. The sequencing output generated a FastQ file size of 2.97 GB with 10.83 million reads having a maximum read length of 101 nucleotides. The number of contigs generated was 53,854 with maximum and minimum lengths of 10,086 and 200 nucleotides respectively. These trancripts were annotated using BLAST search against the Aracyc, Swiss-Prot, TrEMBL, gene ontology and clusters of orthologous groups (KOG) databases. The genes involved in lipid metabolism were studied and the transcription factors were identified. Sequence similarity studies of Camelina with the other related organisms indicated the close relatedness of Camelina with Arabidopsis. In addition, bioinformatics analysis revealed the presence of a total of 19,379 simple sequence repeats. This is the first report on Camelina sativa L., where the transcriptome of the entire plant, including seedlings, seed, root, leaves and stem was done. Our data established an excellent resource for gene discovery and provide useful information for functional and comparative genomic studies in this promising biofuel crop.
Metabolism and Genetics of Helicobacter pylori: the Genome Era

PubMed Central

Marais, Armelle; Mendz, George L.; Hazell, Stuart L.; Mégraud, Francis

1999-01-01

The publication of the complete sequence of Helicobacter pylori 26695 in 1997 and more recently that of strain J99 has provided new insight into the biology of this organism. In this review, we attempt to analyze and interpret the information provided by sequence annotations and to compare these data with those provided by experimental analyses. After a brief description of the general features of the genomes of the two sequenced strains, the principal metabolic pathways are analyzed. In particular, the enzymes encoded by H. pylori involved in fermentative and oxidative metabolism, lipopolysaccharide biosynthesis, nucleotide biosynthesis, aerobic and anaerobic respiration, and iron and nitrogen assimilation are described, and the areas of controversy between the experimental data and those provided by the sequence annotation are discussed. The role of urease, particularly in pH homeostasis, and other specialized mechanisms developed by the bacterium to maintain its internal pH are also considered. The replicational, transcriptional, and translational apparatuses are reviewed, as is the regulatory network. The numerous findings on the metabolism of the bacteria and the paucity of gene expression regulation systems are indicative of the high level of adaptation to the human gastric environment. Arguments in favor of the diversity of H. pylori and molecular data reflecting possible mechanisms involved in this diversity are presented. Finally, we compare the numerous experimental data on the colonization factors and those provided from the genome sequence annotation, in particular for genes involved in motility and adherence of the bacterium to the gastric tissue. PMID:10477311
Many human accelerated regions are developmental enhancers

PubMed Central

Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.

2013-01-01

The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces

PubMed Central

Romero, Héctor; Zavala, Alejandro; Musto, Héctor

2000-01-01

The patterns of synonymous codon choices of the completely sequenced genome of the bacterium Chlamydia trachomatis were analysed. We found that the most important source of variation among the genes results from whether the sequence is located on the leading or lagging strand of replication, resulting in an over representation of G or C, respectively. This can be explained by different mutational biases associated to the different enzymes that replicate each strand. Next we found that most highly expressed sequences are located on the leading strand of replication. From this result, replicational-transcriptional selection can be invoked. Then, when the genes located on the leading strand are studied separately, the correspondence analysis detects a principal trend which discriminates between lowly and highly expressed sequences, the latter displaying a different codon usage pattern than the former, suggesting selection for translation, which is reinforced by the fact that Ks values between orthologous sequences from C.trachomatis and Chlamydia pneumoniae are much smaller in highly expressed genes. Finally, synonymous codon choices appear to be influenced by the hydropathy of each encoded protein and by the degree of amino acid conservation. Therefore, synonymous codon usage in C.trachomatis seems to be the result of a very complex balance among different factors, which rises the problem of whether the forces driving codon usage patterns among microorganisms are rather more complex than generally accepted. PMID:10773076
Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces.

PubMed

Romero, H; Zavala, A; Musto, H

2000-05-15

The patterns of synonymous codon choices of the completely sequenced genome of the bacterium Chlamydia trachomatis were analysed. We found that the most important source of variation among the genes results from whether the sequence is located on the leading or lagging strand of replication, resulting in an over representation of G or C, respectively. This can be explained by different mutational biases associated to the different enzymes that replicate each strand. Next we found that most highly expressed sequences are located on the leading strand of replication. From this result, replicational-transcriptional selection can be invoked. Then, when the genes located on the leading strand are studied separately, the correspondence analysis detects a principal trend which discriminates between lowly and highly expressed sequences, the latter displaying a different codon usage pattern than the former, suggesting selection for translation, which is reinforced by the fact that Ks values between orthologous sequences from C. trachomatis and Chlamydia pneumoniae are much smaller in highly expressed genes. Finally, synonymous codon choices appear to be influenced by the hydropathy of each encoded protein and by the degree of amino acid conservation. Therefore, synonymous codon usage in C.trachomatis seems to be the result of a very complex balance among different factors, which rises the problem of whether the forces driving codon usage patterns among microorganisms are rather more complex than generally accepted.
Comparative chloroplast genomics: Analyses including new sequencesfrom the angiosperms Nuphar advena and Ranunculus macranthus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raubeso, Linda A.; Peery, Rhiannon; Chumley, Timothy W.

2007-03-01

The number of completely sequenced plastid genomes available is growing rapidly. This new array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is most useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the new genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (from the basal group of eudicots). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as proteinmore » coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition.« less
Species identification of mutans streptococci by groESL gene sequence.

PubMed

Hung, Wei-Chung; Tsai, Jui-Chang; Hsueh, Po-Ren; Chia, Jean-San; Teng, Lee-Jene

2005-09-01

The near full-length sequences of the groESL genes were determined and analysed among eight reference strains (serotypes a to h) representing five species of mutans group streptococci. The groES sequences from these reference strains revealed that there are two lengths (285 and 288 bp) in the five species. The intergenic spacer between groES and groEL appears to be a unique marker for species, with a variable size (ranging from 111 to 310 bp) and sequence. Phylogenetic analysis of groES and groEL separated the eight serotypes into two major clusters. Strains of serotypes b, c, e and f were highly related and had groES gene sequences of the same length, 288 bp, while strains of serotypes a, d, g and h were also closely related and their groES gene sequence lengths were 285 bp. The groESL sequences in clinical isolates of three serotypes of S. mutans were analysed for intraspecies polymorphism. The results showed that the groESL sequences could provide information for differentiation among species, but were unable to distinguish serotypes of the same species. Based on the determined sequences, a PCR assay was developed that could differentiate members of the mutans streptococci by amplicon size and provide an alternative way for distinguishing mutans streptococci from other viridans streptococci.
Lability of High Molecular Weight Dissolved Organic Matter Polysaccharides Increases with Mild Acid or Base Treatment.

NASA Astrophysics Data System (ADS)

Pedler Sherwood, B.; Sosa, O.; Nelson, C. E.; Repeta, D.; DeLong, E.

2016-02-01

Approximately 662 Pg of dissolved organic carbon (DOC) has accumulated in the global ocean, yet the biological and chemical constraints on DOC turnover remain poorly understood. High molecular weight dissolved organic matter (HMWDOM) is largely comprised of semi-labile polysaccharides. These polysaccharides resist degradation even in the presence of nutrient amendments, suggesting unknown factors of polysaccharide composition affect microbial degradation. In a series of microcosm incubations conducted at station ALOHA in the North Pacific Subtropical Gyre, we tested the affect of mild base (KOH-DOM) and acid (HCl-DOM) treatments on polysaccharide lability. KOH-DOM, HCl-DOM, and untreated HMWDOM was added to seawater from the deep chlorophyll maximum and 200m. Microcosms amended with KOH-DOM and HCl-DOM yielded higher bacterial abundance and greater carbon drawdown relative to untreated HMWDOM and unamended controls. Microcosms amended with KOH-DOM and HCl-DOM also showed significant production of fluorescent DOM (fDOM), whereas untreated HMWDOM and unamended controls showed a net decrease in fDOM as measured by parallel factor analysis of DOM excitation-emission spectra. Metagenomic analyses revealed that microcosms amended with untreated HMWDOM and controls became dominated by Alteromonas genera ( 60% total sequence reads). In contrast, KOH-DOM and HCl-DOM amended microcosms yielded greater bacterial diversity; Alteromonas genera comprised 25% of sequence reads, with differences primarily accounted for by proportional increases in vibrio, roseobacter, rugeria and marinomonas clades. Transcriptomic analyses identified differential gene expression during growth on each DOM fraction. This study provides new insight into specific chemical moieties that may limit the bacterial degradation rate of semi-labile HMWDOM in the ocean.
A Novel Alternative Splicing Isoform of Human T-Cell Leukemia Virus Type 1 bZIP Factor (HBZ-SI) Targets Distinct Subnuclear Localization

PubMed Central

Murata, Ken; Hayashibara, Toshihisa; Sugahara, Kazuyuki; Uemura, Akiko; Yamaguchi, Taku; Harasawa, Hitomi; Hasegawa, Hiroo; Tsuruda, Kazuto; Okazaki, Toshiro; Koji, Takehiko; Miyanishi, Takayuki; Yamada, Yasuaki; Kamihira, Shimeru

2006-01-01

Adult T-cell leukemia (ATL) is associated with prior infection with human T-cell leukemia virus type 1 (HTLV-1); however, the mechanism by which HTLV-1 causes adult T-cell leukemia has not been fully elucidated. Recently, a functional basic leucine zipper (bZIP) protein coded in the minus strand of HTLV-1 genome (HBZ) was identified. We report here a novel isoform of the HTLV-1 bZIP factor (HBZ), HBZ-SI, identified by means of reverse transcription-PCR (RT-PCR) in conjunction with 5′ and 3′ rapid amplification of cDNA ends (RACE). HBZ-SI is a 206-amino-acid-long protein and is generated by alternative splicing between part of the HBZ gene and a novel exon located in the 3′ long terminal repeat of the HTLV-1 genome. Consequently, these isoforms share >95% amino acid sequence identity, and differ only at their N termini, indicating that HBZ-SI is also a functional protein. Duplex RT-PCR and real-time quantitative RT-PCR analyses showed that the mRNAs of these isoforms were expressed at equivalent levels in all ATL cell samples examined. Nonetheless, we found by Western blotting that the HBZ-SI protein was preferentially expressed in some ATL cell lines examined. A key finding was obtained from the subcellular localization analyses of these isoforms. Despite their high sequence similarity, each isoform was targeted to distinguishable subnuclear structures. These data show the presence of a novel isoform of HBZ in ATL cells, and in addition, shed new light on the possibility that each isoform may play a unique role in distinct regions in the cell nucleus. PMID:16474156
Utilization of sequence on relatives to improve analysis of individuals' low-coverage NGS data

USDA-ARS?s Scientific Manuscript database

Low-coverage sequence data is expected to have low call rates under the prevailing paradigm that genotypes are first “called” from sequence data of each individual independently and subsequent analyses (including determination of haplotypes) are dependent on those called genotypes. However, provide...
Exploring Evolutionary Patterns in Genetic Sequence: A Computer Exercise

ERIC Educational Resources Information Center

Shumate, Alice M.; Windsor, Aaron J.

2010-01-01

The increase in publications presenting molecular evolutionary analyses and the availability of comparative sequence data through resources such as NCBI's GenBank underscore the necessity of providing undergraduates with hands-on sequence analysis skills in an evolutionary context. This need is particularly acute given that students have been…
Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome

USDA-ARS?s Scientific Manuscript database

Modern biological analyses are often assisted by recent technologies making the sequencing of complex genomes both technically possible and feasible. We recently sequenced the tomato genome that, like many eukaryotic genomes, is large and complex. Current sequencing technologies allow the developmen...
Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

PubMed

O'Donnell, Kerry; Sutton, Deanna A; Rinaldi, Michael G; Sarver, Brice A J; Balajee, S Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C; Robert, Vincent A R G; Crous, Pedro W; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M

2010-10-01

Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the sequence results can be verified and isolates are made available for future study.
Mammoth and Mastodon collagen sequences; survival and utility

NASA Astrophysics Data System (ADS)

Buckley, M.; Larkin, N.; Collins, M.

2011-04-01

Near-complete collagen (I) sequences are proposed for elephantid and mammutid taxa, based upon available African elephant genomic data and supported with LC-MALDI-MS/MS and LC-ESI-MS/MS analyses of collagen digests from proboscidean bone. Collagen sequence coverage was investigated from several specimens of two extinct mammoths ( Mammuthus trogontherii and Mammuthus primigenius), the extinct American mastodon ( Mammut americanum), the extinct straight-tusked elephant ( Elephas ( Palaeoloxodon) antiquus) and extant Asian ( Elephas maximus) and African ( Loxodonta africana) elephants and compared between the two ionization techniques used. Two suspected mammoth fossils from the British Middle Pleistocene (Cromerian) deposits of the West Runton Forest Bed were analysed to investigate the potential use of peptide mass spectrometry for fossil identification. Despite the age of the fossils, sufficient peptides were obtained to identify these as elephantid, and sufficient sequence variation to discriminate elephantid and mammutid collagen (I). In-depth LC-MS analyses further failed to identify a peptide that could be used to reliably distinguish between the three genera of elephantids ( Elephas, Loxodonta and Mammuthus), an observation consistent with predicted amino acid substitution rates between these species.
Whole genome sequencing data and de novo draft assemblies for 66 teleost species

PubMed Central

Malmstrøm, Martin; Matschiner, Michael; Tørresen, Ole K.; Jakobsen, Kjetill S.; Jentoft, Sissel

2017-01-01

Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9–39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts. PMID:28094797
An in-silico insight into the characteristics of β-propeller phytase.

PubMed

Mathew, Akash; Verma, Anukriti; Gaur, Smriti

2014-06-01

Phytase is an enzyme that is found extensively in the plant kingdom and in some species of bacteria and fungi. This paper identifies and analyses the available full length sequences of β-propeller phytases (BPP). BPP was chosen due to its potential applicability in the field of aquaculture. The sequences were obtained from the Uniprot database and subject to various online bioinformatics tools to elucidate the physio-chemical characteristics, secondary structures and active site compositions of BPP. Protparam and SOPMA were used to analyse the physiochemical and secondary structure characteristics, while the Expasy online modelling tool and CASTp were used to model the 3-D structure and identify the active sites of the BPP sequences. The amino acid compositions of the four sequences were compared and composed in a graphical format to identify similarities and highlight the potentially important amino acids that form the active site of BPP. This study aims to analyse BPP and contribute to the clarification of the molecular mechanism involved in the enzyme activity of BPP and contribute in part to the possibility of constructing a synthetic version of BPP.
Chromatin and RNAi factors protect the C. elegans germline against repetitive sequences

PubMed Central

Robert, Valérie J.P.; Sijen, Titia; van Wolfswinkel, Josien; Plasterk, Ronald H.A.

2005-01-01

Protection of genomes against invasion by repetitive sequences, such as transposons, viruses, and repetitive transgenes, involves strong and selective silencing of these sequences. During silencing of repetitive transgenes, a trans effect (“cosuppression”) occurs that results in silencing of cognate endogenous genes. Here we report RNA interference (RNAi) screens performed to catalog genes required for cosuppression in the Caenorhabditis elegans germline. We find factors with a putative role in chromatin remodeling and factors involved in RNAi. Together with molecular data also presented in this study, these results suggest that in C. elegans repetitive sequences trigger transcriptional gene silencing using RNAi and chromatin factors. PMID:15774721

Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

PubMed

Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

2013-12-29

Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.
Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

PubMed Central

2013-01-01

Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters

PubMed Central

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana. The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs. PMID:29326750
Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

PubMed

Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H

2015-06-01

The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.
Integrative View of the Diversity and Evolution of SWEET and SemiSWEET Sugar Transporters.

PubMed

Jia, Baolei; Zhu, Xiao Feng; Pu, Zhong Ji; Duan, Yu Xi; Hao, Lu Jiang; Zhang, Jie; Chen, Li-Qing; Jeon, Che Ok; Xuan, Yuan Hu

2017-01-01

Sugars Will Eventually be Exported Transporter (SWEET) and SemiSWEET are recently characterized families of sugar transporters in eukaryotes and prokaryotes, respectively. SemiSWEETs contain 3 transmembrane helices (TMHs), while SWEETs contain 7. Here, we performed sequence-based comprehensive analyses for SWEETs and SemiSWEETs across the biosphere. In total, 3,249 proteins were identified and ≈60% proteins were found in green plants and Oomycota, which include a number of important plant pathogens. Protein sequence similarity networks indicate that proteins from different organisms are significantly clustered. Of note, SemiSWEETs with 3 or 4 TMHs that may fuse to SWEET were identified in plant genomes. 7-TMH SWEETs were found in bacteria, implying that SemiSWEET can be fused directly in prokaryote. 15-TMH extraSWEET and 25-TMH superSWEET were also observed in wild rice and oomycetes, respectively. The transporters can be classified into 4, 2, 2, and 2 clades in plants, Metazoa, unicellular eukaryotes, and prokaryotes, respectively. The consensus and coevolution of amino acids in SWEETs were identified by multiple sequence alignments. The functions of the highly conserved residues were analyzed by molecular dynamics analysis. The 19 most highly conserved residues in the SWEETs were further confirmed by point mutagenesis using SWEET1 from Arabidopsis thaliana . The results proved that the conserved residues located in the extrafacial gate (Y57, G58, G131, and P191), the substrate binding pocket (N73, N192, and W176), and the intrafacial gate (P43, Y83, F87, P145, M161, P162, and Q202) play important roles for substrate recognition and transport processes. Taken together, our analyses provide a foundation for understanding the diversity, classification, and evolution of SWEETs and SemiSWEETs using large-scale sequence analysis and further show that gene duplication and gene fusion are important factors driving the evolution of SWEETs.
Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution

PubMed Central

Willerslev, Eske; Gilbert, M Thomas P; Binladen, Jonas; Ho, Simon YW; Campos, Paula F; Ratan, Aakrosh; Tomsho, Lynn P; da Fonseca, Rute R; Sher, Andrei; Kuznetsova, Tatanya V; Nowak-Kemp, Malgosia; Roth, Terri L; Miller, Webb; Schuster, Stephan C

2009-01-01

Background The scientific literature contains many examples where DNA sequence analyses have been used to provide definitive answers to phylogenetic problems that traditional (non-DNA based) approaches alone have failed to resolve. One notable example concerns the rhinoceroses, a group for which several contradictory phylogenies were proposed on the basis of morphology, then apparently resolved using mitochondrial DNA fragments. Results In this study we report the first complete mitochondrial genome sequences of the extinct ice-age woolly rhinoceros (Coelodonta antiquitatis), and the threatened Javan (Rhinoceros sondaicus), Sumatran (Dicerorhinus sumatrensis), and black (Diceros bicornis) rhinoceroses. In combination with the previously published mitochondrial genomes of the white (Ceratotherium simum) and Indian (Rhinoceros unicornis) rhinoceroses, this data set putatively enables reconstruction of the rhinoceros phylogeny. While the six species cluster into three strongly supported sister-pairings: (i) The black/white, (ii) the woolly/Sumatran, and (iii) the Javan/Indian, resolution of the higher-level relationships has no statistical support. The phylogenetic signal from individual genes is highly diffuse, with mixed topological support from different genes. Furthermore, the choice of outgroup (horse vs tapir) has considerable effect on reconstruction of the phylogeny. The lack of resolution is suggestive of a hard polytomy at the base of crown-group Rhinocerotidae, and this is supported by an investigation of the relative branch lengths. Conclusion Satisfactory resolution of the rhinoceros phylogeny may not be achievable without additional analyses of substantial amounts of nuclear DNA. This study provides a compelling demonstration that, in spite of substantial sequence length, there are significant limitations with single-locus phylogenetics. We expect further examples of this to appear as next-generation, large-scale sequencing of complete mitochondrial genomes becomes commonplace in evolutionary studies. "The human factor in classification is nowhere more evident than in dealing with this superfamily (Rhinocerotoidea)." G. G. Simpson (1945) PMID:19432984
Frameshift mutations of TAF1C gene, a core component for transcription by RNA polymerase I, and its regional heterogeneity in gastric and colorectal cancers.

PubMed

Oh, Hye Rim; An, Chang Hyeok; Yoo, Nam Jin; Lee, Sug Hyung

2015-02-01

Initiation of transcription for ribosomal RNA (rRNA) by RNA polymerase I requires TATA-binding protein (TBP) and TBP-associated factors (TAF1A, TAF1B and TAF1C). p53 tumour suppressor inhibits rRNA transcription by blocking TAF1C-UBF interaction, but alterations of TAF1C itself in tumorigenesis remain unknown. The aim of this study was to explore whether TAF1C gene was mutated in gastric (GC) and colorectal cancers (CRC).In a public database, we found that TAF1C gene had a mononucleotide repeat (C8) in the coding sequences that might be a mutation target in the cancers with microsatellite instability (MSI). We analysed 79 GC and 124 CRC by single-strand conformation polymorphism and DNA sequencing analyses. In this study, we found TAF1C frameshift mutations (8.8% of GC and 10.1% of CRC with MSI-H), which were not found in stable MSI/low MSI (MSS/MSI-L) (0/90). In addition, we analysed intratumoural heterogeneity (ITH) of TAF1C frameshift mutations in 16 CRC and found that three CRC (18.8%) harboured regional ITH of the TAF1C frameshift mutations. Our results indicate that TAF1C gene harboured not only somatic frameshift mutations but also the mutational ITH, which together might play a role in tumourigenesis of GC and CRC. Our data also suggest that multi-regional mutation analysis is needed for a better evaluation of the mutation status in CRC.
Developmentally programmed DNA splicing in Paramecium reveals short-distance crosstalk between DNA cleavage sites

PubMed Central

Gratias, Ariane; Lepère, Gersende; Garnier, Olivier; Rosa, Sarah; Duharcourt, Sandra; Malinsky, Sophie; Meyer, Eric; Bétermier, Mireille

2008-01-01

Somatic genome assembly in the ciliate Paramecium involves the precise excision of thousands of short internal eliminated sequences (IESs) that are scattered throughout the germline genome and often interrupt open reading frames. Excision is initiated by double-strand breaks centered on the TA dinucleotides that are conserved at each IES boundary, but the factors that drive cleavage site recognition remain unknown. A degenerate consensus was identified previously at IES ends and genetic analyses confirmed the participation of their nucleotide sequence in efficient excision. Even for wild-type IESs, however, variant excision patterns (excised or nonexcised) may be inherited maternally through sexual events, in a homology-dependent manner. We show here that this maternal epigenetic control interferes with the targeting of DNA breaks at IES ends. Furthermore, we demonstrate that a mutation in the TA at one end of an IES impairs DNA cleavage not only at the mutant end but also at the wild-type end. We conclude that crosstalk between both ends takes place prior to their cleavage and propose that the ability of an IES to adopt an excision-prone conformation depends on the combination of its nucleotide sequence and of additional determinants. PMID:18420657
Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment.

PubMed

Villanueva, Pía; Nudel, Ron; Hoischen, Alexander; Fernández, María Angélica; Simpson, Nuala H; Gilissen, Christian; Reader, Rose H; Jara, Lillian; Echeverry, María Magdalena; Echeverry, Maria Magdalena; Francks, Clyde; Baird, Gillian; Conti-Ramsden, Gina; O'Hare, Anne; Bolton, Patrick F; Hennessy, Elizabeth R; Palomino, Hernán; Carvajal-Carmona, Luis; Veltman, Joris A; Cazier, Jean-Baptiste; De Barbieri, Zulema; Fisher, Simon E; Newbury, Dianne F

2015-03-01

Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.
Passing faces: sequence-dependent variations in the perceptual processing of emotional faces.

PubMed

Karl, Christian; Hewig, Johannes; Osinsky, Roman

2016-10-01

There is broad evidence that contextual factors influence the processing of emotional facial expressions. Yet temporal-dynamic aspects, inter alia how face processing is influenced by the specific order of neutral and emotional facial expressions, have been largely neglected. To shed light on this topic, we recorded electroencephalogram from 168 healthy participants while they performed a gender-discrimination task with angry and neutral faces. Our event-related potential (ERP) analyses revealed a strong emotional modulation of the N170 component, indicating that the basic visual encoding and emotional analysis of a facial stimulus happen, at least partially, in parallel. While the N170 and the late positive potential (LPP; 400-600 ms) were only modestly affected by the sequence of preceding faces, we observed a strong influence of face sequences on the early posterior negativity (EPN; 200-300 ms). Finally, the differing response patterns of the EPN and LPP indicate that these two ERPs represent distinct processes during face analysis: while the former seems to represent the integration of contextual information in the perception of a current face, the latter appears to represent the net emotional interpretation of a current face.
Mitochondrial DNA Evidence Supports the Hypothesis that Triodontophorus Species Belong to Cyathostominae

PubMed Central

Gao, Yuan; Zhang, Yan; Yang, Xin; Qiu, Jian-Hua; Duan, Hong; Xu, Wen-Wen; Chang, Qiao-Cheng; Wang, Chun-Ren

2017-01-01

Equine strongyles, the significant nematode pathogens of horses, are characterized by high quantities and species abundance, but classification of this group of parasitic nematodes is debated. Mitochondrial (mt) genome DNA data are often used to address classification controversies. Thus, the objectives of this study were to determine the complete mt genomes of three Cyathostominae nematode species (Cyathostomum catinatum, Cylicostephanus minutus, and Poteriostomum imparidentatum) of horses and reconstruct the phylogenetic relationship of Strongylidae with other nematodes in Strongyloidea to test the hypothesis that Triodontophorus spp. belong to Cyathostominae using the mt genomes. The mt genomes of Cy. catinatum, Cs. minutus, and P. imparidentatum were 13,838, 13,826, and 13,817 bp in length, respectively. Complete mt nucleotide sequence comparison of all Strongylidae nematodes revealed that sequence identity ranged from 77.8 to 91.6%. The mt genome sequences of Triodontophorus species had relatively high identity with Cyathostominae nematodes, rather than Strongylus species of the same subfamily (Strongylinae). Comparative analyses of mt genome organization for Strongyloidea nematodes sequenced to date revealed that members of this superfamily possess identical gene arrangements. Phylogenetic analyses using mtDNA data indicated that the Triodontophorus species clustered with Cyathostominae species instead of Strongylus species. The present study first determined the complete mt genome sequences of Cy. catinatum, Cs. minutus, and P. imparidentatum, which will provide novel genetic markers for further studies of Strongylidae taxonomy, population genetics, and systematics. Importantly, sequence comparison and phylogenetic analyses based on mtDNA sequences supported the hypothesis that Triodontophorus belongs to Cyathostominae. PMID:28824575
Complete Genome Sequence of Acinetobacter baumannii CIP 70.10, a Susceptible Reference Strain for Comparative Genome Analyses.

PubMed

Krahn, Thomas; Wibberg, Daniel; Maus, Irena; Winkler, Anika; Pühler, Alfred; Poirel, Laurent; Schlüter, Andreas

2015-07-30

The complete genome sequence for the reference strain Acinetobacter baumannii CIP 70.10 (ATCC 15151) was established. The strain was isolated in France in 1970, is susceptible to most antimicrobial compounds, and is therefore of importance for comparative genome analyses with clinical multidrug-resistant (MDR) A. baumannii strains to study resistance development and acquisition in this emerging human pathogen. Copyright © 2015 Krahn et al.
Inhaled corticosteroids and the occurrence of oral candidiasis: a prescription sequence symmetry analysis.

PubMed

van Boven, Job F M; de Jong-van den Berg, Lolkje T W; Vegter, Stefan

2013-04-01

The primary aim of the study was to gain insight into the relative risk of clinically relevant oral candidiasis following inhaled corticosteroid (ICS) initiation over time. A secondary aim was to analyse the influence of patient characteristics and co-medication on the occurrence of this adverse effect. Drug prescription data from 1994 to 2011 were retrieved from the IADB.nl database. To study the influence of ICS use on occurrence of oral candidiasis, a prescription symmetry analysis was used, including patients using medication for oral candidiasis up to 1 year before or after ICS initiation. The relative risk was calculated by dividing the number of patients receiving medication for oral candidiasis after ICS initiation by the number of patients receiving the same medication before ICS initiation. Sub-analyses were conducted to compare the relative risks at several time points after ICS initiation and to account for therapy persistence by only including chronic users of ICS. A multivariate logistic regression model was used to identify predictive factors. A total of 52,279 incident users of ICS therapy were identified, of which 1,081 received medication for oral candidiasis up to 1 year before or after ICS initiation. A total of 701 patients received medication for oral candidiasis after ICS initiation, while 361 received these medications in the reversed sequence, resulting in a sequence ratio (SR) of 1.94 (95 % CI 1.71-2.21). In the first 3 months after ICS initiation, the SR was 2.72 (95 % CI 2.19-3.38) and then decreased to 1.47 (95 % CI 1.11-1.95) 9-12 months after ICS initiation. Predictive factors were higher daily dose of ICS and concomitant use of oral corticosteroids. This study found a significant and clinically relevant increased number of patients receiving medication for oral candidiasis in the first year after therapy initiation with ICS. Relative risk is highest in the first 3 months, but remains increased up to at least 1 year after ICS initiation. This study stresses the need for patient education and inhalation instruction.
GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species

PubMed Central

Chandrashekar, Darshan Shimoga; Dey, Poulami; Acharya, Kshitish K.

2015-01-01

Background Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses. Result We developed the ‘Genomic Repeat Element Analyzer for Mammals’ (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species. Conclusion GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/. PMID:26208093
Identification and Functional Characterization of Hypoxia-Induced Endoplasmic Reticulum Stress Regulating lncRNA (HypERlnc) in Pericytes.

PubMed

Bischoff, Florian C; Werner, Astrid; John, David; Boeckel, Jes-Niels; Melissari, Maria-Theodora; Grote, Phillip; Glaser, Simone F; Demolli, Shemsi; Uchida, Shizuka; Michalik, Katharina M; Meder, Benjamin; Katus, Hugo A; Haas, Jan; Chen, Wei; Pullamsetti, Soni S; Seeger, Werner; Zeiher, Andreas M; Dimmeler, Stefanie; Zehendner, Christoph M

2017-08-04

Pericytes are essential for vessel maturation and endothelial barrier function. Long noncoding RNAs regulate many cellular functions, but their role in pericyte biology remains unexplored. Here, we investigate the effect of hypoxia-induced endoplasmic reticulum stress regulating long noncoding RNAs (HypERlnc, also known as ENSG00000262454) on pericyte function in vitro and its regulation in human heart failure and idiopathic pulmonary arterial hypertension. RNA sequencing in human primary pericytes identified hypoxia-regulated long noncoding RNAs, including HypERlnc. Silencing of HypERlnc decreased cell viability and proliferation and resulted in pericyte dedifferentiation, which went along with increased endothelial permeability in cocultures consisting of human primary pericyte and human coronary microvascular endothelial cells. Consistently, Cas9-based transcriptional activation of HypERlnc was associated with increased expression of pericyte marker genes. Moreover, HypERlnc knockdown reduced endothelial-pericyte recruitment in Matrigel assays ( P <0.05). Mechanistically, transcription factor reporter arrays demonstrated that endoplasmic reticulum stress-related transcription factors were prominently activated by HypERlnc knockdown, which was confirmed via immunoblotting for the endoplasmic reticulum stress markers IRE1α ( P <0.001), ATF6 ( P <0.01), and soluble BiP ( P <0.001). Kyoto encyclopedia of genes and gene ontology pathway analyses of RNA sequencing experiments after HypERlnc knockdown indicate a role in cardiovascular disease states. Indeed, HypERlnc expression was significantly reduced in human cardiac tissue from patients with heart failure ( P <0.05; n=19) compared with controls. In addition, HypERlnc expression significantly correlated with pericyte markers in human lungs derived from patients diagnosed with idiopathic pulmonary arterial hypertension and from donor lungs (n=14). Here, we show that HypERlnc regulates human pericyte function and the endoplasmic reticulum stress response. In addition, RNA sequencing analyses in conjunction with reduced expression of HypERlnc in heart failure and correlation with pericyte markers in idiopathic pulmonary arterial hypertension indicate a role of HypERlnc in human cardiopulmonary disease. © 2017 American Heart Association, Inc.
Comprehensive analysis of gene expression patterns in Friedreich's ataxia fibroblasts by RNA sequencing reveals altered levels of protein synthesis factors and solute carriers

PubMed Central

Li, Yanjie; Lu, Yue; Lin, Kevin; Hauser, Lauren A.; Lynch, David R.

2017-01-01

ABSTRACT Friedreich's ataxia (FRDA) is an autosomal recessive neurodegenerative disease usually caused by large homozygous expansions of GAA repeat sequences in intron 1 of the frataxin (FXN) gene. FRDA patients homozygous for GAA expansions have low FXN mRNA and protein levels when compared with heterozygous carriers or healthy controls. Frataxin is a mitochondrial protein involved in iron–sulfur cluster synthesis, and many FRDA phenotypes result from deficiencies in cellular metabolism due to lowered expression of FXN. Presently, there is no effective treatment for FRDA, and biomarkers to measure therapeutic trial outcomes and/or to gauge disease progression are lacking. Peripheral tissues, including blood cells, buccal cells and skin fibroblasts, can readily be isolated from FRDA patients and used to define molecular hallmarks of disease pathogenesis. For instance, FXN mRNA and protein levels as well as FXN GAA-repeat tract lengths are routinely determined using all of these cell types. However, because these tissues are not directly involved in disease pathogenesis, their relevance as models of the molecular aspects of the disease is yet to be decided. Herein, we conducted unbiased RNA sequencing to profile the transcriptomes of fibroblast cell lines derived from 18 FRDA patients and 17 unaffected control individuals. Bioinformatic analyses revealed significantly upregulated expression of genes encoding plasma membrane solute carrier proteins in FRDA fibroblasts. Conversely, the expression of genes encoding accessory factors and enzymes involved in cytoplasmic and mitochondrial protein synthesis was consistently decreased in FRDA fibroblasts. Finally, comparison of genes differentially expressed in FRDA fibroblasts to three previously published gene expression signatures defined for FRDA blood cells showed substantial overlap between the independent datasets, including correspondingly deficient expression of antioxidant defense genes. Together, these results indicate that gene expression profiling of cells derived from peripheral tissues can, in fact, consistently reveal novel molecular pathways of the disease. When performed on statistically meaningful sample group sizes, unbiased global profiling analyses utilizing peripheral tissues are critical for the discovery and validation of FRDA disease biomarkers. PMID:29125828
Comparative analysis of chloroplast genomes of the genus Citrus and its close relatives.

PubMed

Liu, Xiaogang; Wu, Hongkun; Luo, Yan; Xi, Wanpeng; Zhou, Zhiqin

2017-01-01

The genus Citrus and its close relatives are economically and nutritionally important fruit trees. However, the huge controversy over the phylogeny of key wild species, as well as the genetic relationship between the cultivated species and their putative wild progenitors, remains unresolved. Comparative analyses of chloroplast (cp) genomes have been useful in resolving various phylogenetic issues. Thus far, the cp genomes of only two Citrus species have been sequenced. In this study, we sequenced six complete cp genomes, four belonging to the genus Citrus, and two belonging to the genera Fortunella and Poncirus, respectively. These newly sequenced genomes together with the two publicly available were used for comparative analyses of the genus Citrus and its close relatives. All eight cp genomes share similar basic structure, gene order and gene content. Phylogenetic analyses supported the monophyly of the three genera in the order Sapindales within the major clade Malvidae.
Final Report for LDRD Project 02-ERD-069: Discovering the Unknown Mechanism(s) of Virulence in a BW, Class A Select Agent

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chain, P; Garcia, E

2003-02-06

The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Functional Assays and Metagenomic Analyses Reveals Differences between the Microbial Communities Inhabiting the Soil Horizons of a Norway Spruce Plantation

PubMed Central

Uroz, Stéphane; Ioannidis, Panos; Lengelle, Juliette; Cébron, Aurélie; Morin, Emmanuelle; Buée, Marc; Martin, Francis

2013-01-01

In temperate ecosystems, acidic forest soils are among the most nutrient-poor terrestrial environments. In this context, the long-term differentiation of the forest soils into horizons may impact the assembly and the functions of the soil microbial communities. To gain a more comprehensive understanding of the ecology and functional potentials of these microbial communities, a suite of analyses including comparative metagenomics was applied on independent soil samples from a spruce plantation (Breuil-Chenue, France). The objectives were to assess whether the decreasing nutrient bioavailability and pH variations that naturally occurs between the organic and mineral horizons affects the soil microbial functional biodiversity. The 14 Gbp of pyrosequencing and Illumina sequences generated in this study revealed complex microbial communities dominated by bacteria. Detailed analyses showed that the organic soil horizon was significantly enriched in sequences related to Bacteria, Chordata, Arthropoda and Ascomycota. On the contrary the mineral horizon was significantly enriched in sequences related to Archaea. Our analyses also highlighted that the microbial communities inhabiting the two soil horizons differed significantly in their functional potentials according to functional assays and MG-RAST analyses, suggesting a functional specialisation of these microbial communities. Consistent with this specialisation, our shotgun metagenomic approach revealed a significant increase in the relative abundance of sequences related glycoside hydrolases in the organic horizon compared to the mineral horizon that was significantly enriched in glycoside transferases. This functional stratification according to the soil horizon was also confirmed by a significant correlation between the functional assays performed in this study and the functional metagenomic analyses. Together, our results suggest that the soil stratification and particularly the soil resource availability impact the functional diversity and to a lesser extent the taxonomic diversity of the bacterial communities. PMID:23418476
Evolution of poor reporting and inadequate methods over time in 20 920 randomised controlled trials included in Cochrane reviews: research on research study.

PubMed

Dechartres, Agnes; Trinquart, Ludovic; Atal, Ignacio; Moher, David; Dickersin, Kay; Boutron, Isabelle; Perrodeau, Elodie; Altman, Douglas G; Ravaud, Philippe

2017-06-08

Objective To examine how poor reporting and inadequate methods for key methodological features in randomised controlled trials (RCTs) have changed over the past three decades. Design Mapping of trials included in Cochrane reviews. Data sources Data from RCTs included in all Cochrane reviews published between March 2011 and September 2014 reporting an evaluation of the Cochrane risk of bias items: sequence generation, allocation concealment, blinding, and incomplete outcome data. Data extraction For each RCT, we extracted consensus on risk of bias made by the review authors and identified the primary reference to extract publication year and journal. We matched journal names with Journal Citation Reports to get 2014 impact factors. Main outcomes measures We considered the proportions of trials rated by review authors at unclear and high risk of bias as surrogates for poor reporting and inadequate methods, respectively. Results We analysed 20 920 RCTs (from 2001 reviews) published in 3136 journals. The proportion of trials with unclear risk of bias was 48.7% for sequence generation and 57.5% for allocation concealment; the proportion of those with high risk of bias was 4.0% and 7.2%, respectively. For blinding and incomplete outcome data, 30.6% and 24.7% of trials were at unclear risk and 33.1% and 17.1% were at high risk, respectively. Higher journal impact factor was associated with a lower proportion of trials at unclear or high risk of bias. The proportion of trials at unclear risk of bias decreased over time, especially for sequence generation, which fell from 69.1% in 1986-1990 to 31.2% in 2011-14 and for allocation concealment (70.1% to 44.6%). After excluding trials at unclear risk of bias, use of inadequate methods also decreased over time: from 14.8% to 4.6% for sequence generation and from 32.7% to 11.6% for allocation concealment. Conclusions Poor reporting and inadequate methods have decreased over time, especially for sequence generation and allocation concealment. But more could be done, especially in lower impact factor journals. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.

PubMed

Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner

2017-01-16

With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.
Chloroplast phylogenomic analyses resolve deep-level relationships of an intractable bamboo tribe Arundinarieae (poaceae).

PubMed

Ma, Peng-Fei; Zhang, Yu-Xiao; Zeng, Chun-Xia; Guo, Zhen-Hua; Li, De-Zhu

2014-11-01

The temperate woody bamboos constitute a distinct tribe Arundinarieae (Poaceae: Bambusoideae) with high species diversity. Estimating phylogenetic relationships among the 11 major lineages of Arundinarieae has been particularly difficult, owing to a possible rapid radiation and the extremely low rate of sequence divergence. Here, we explore the use of chloroplast genome sequencing for phylogenetic inference. We sampled 25 species (22 temperate bamboos and 3 outgroups) for the complete genome representing eight major lineages of Arundinarieae in an attempt to resolve backbone relationships. Phylogenetic analyses of coding versus noncoding sequences, and of different regions of the genome (large single copy and small single copy, and inverted repeat regions) yielded no well-supported contradicting topologies but potential incongruence was found between the coding and noncoding sequences. The use of various data partitioning schemes in analysis of the complete sequences resulted in nearly identical topologies and node support values, although the partitioning schemes were decisively different from each other as to the fit to the data. Our full genomic data set substantially increased resolution along the backbone and provided strong support for most relationships despite the very short internodes and long branches in the tree. The inferred relationships were also robust to potential confounding factors (e.g., long-branch attraction) and received support from independent indels in the genome. We then added taxa from the three Arundinarieae lineages that were not included in the full-genome data set; each of these were sampled for more than 50% genome sequences. The resulting trees not only corroborated the reconstructed deep-level relationships but also largely resolved the phylogenetic placements of these three additional lineages. Furthermore, adding 129 additional taxa sampled for only eight chloroplast loci to the combined data set yielded almost identical relationships, albeit with low support values. We believe that the inferred phylogeny is robust to taxon sampling. Having resolved the deep-level relationships of Arundinarieae, we illuminate how chloroplast phylogenomics can be used for elucidating difficult phylogeny at low taxonomic levels in intractable plant groups. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Structural Basis for Sequence-specific DNA Recognition by an Arabidopsis WRKY Transcription Factor*

PubMed Central

Yamasaki, Kazuhiko; Kigawa, Takanori; Watanabe, Satoru; Inoue, Makoto; Yamasaki, Tomoko; Seki, Motoaki; Shinozaki, Kazuo; Yokoyama, Shigeyuki

2012-01-01

The WRKY family transcription factors regulate plant-specific reactions that are mostly related to biotic and abiotic stresses. They share the WRKY domain, which recognizes a DNA element (TTGAC(C/T)) termed the W-box, in target genes. Here, we determined the solution structure of the C-terminal WRKY domain of Arabidopsis WRKY4 in complex with the W-box DNA by NMR. A four-stranded β-sheet enters the major groove of DNA in an atypical mode termed the β-wedge, where the sheet is nearly perpendicular to the DNA helical axis. Residues in the conserved WRKYGQK motif contact DNA bases mainly through extensive apolar contacts with thymine methyl groups. The importance of these contacts was verified by substituting the relevant T bases with U and by surface plasmon resonance analyses of DNA binding. PMID:22219184
The Chapel Hill hemophilia A dog colony exhibits a factor VIII gene inversion

PubMed Central

Lozier, Jay N.; Dutra, Amalia; Pak, Evgenia; Zhou, Nan; Zheng, Zhili; Nichols, Timothy C.; Bellinger, Dwight A.; Read, Marjorie; Morgan, Richard A.

2002-01-01

In the Chapel Hill colony of factor VIII-deficient dogs, abnormal sequence (ch8, for canine hemophilia 8, GenBank no. AF361485) follows exons 1–22 in the factor VIII transcript in place of exons 23–26. The canine hemophilia 8 locus (ch8) sequence was found in a 140-kb normal dog genomic DNA bacterial artificial chromosome (BAC) clone that was completely outside the factor VIII gene, but not in BAC clones containing the factor VIII gene. The BAC clone that contained ch8 also contained a homologue of F8A (factor 8 associated) sequence, which participates in a common inversion that causes severe hemophilia A in humans. Fluorescence in situ hybridization analysis indicated that exons 1–26 normally proceed sequentially from telomere to centromere at Xq28, and ch8 is telomeric to the factor VIII gene. The appearance of an “upstream” genomic sequence element (ch8) at the end of the aberrant factor VIII transcript suggested that an inversion of genomic DNA replaced factor VIII exons 22–26 with ch8. The F8A sequence appeared also in overlapping normal BAC clones containing factor VIII sequence. We hypothesized that homologous recombination between copies of canine F8A inside and outside the factor VIII gene had occurred, as in human hemophilia A. High-resolution fluorescent in situ hybridization on hemophilia A dog DNA revealed a pattern consistent with this inversion mechanism. We also identified a HindIII restriction fragment length polymorphism of F8A fragments that distinguished hemophilia A, carrier, and normal dogs' DNA. The Chapel Hill hemophilia A dog colony therefore replicates the factor VIII gene inversion commonly seen in humans with severe hemophilia A. PMID:12242334
Risk factors and molecular epidemiology of extended-spectrum β-lactamase-producing Klebsiella pneumoniae in Xiamen, China.

PubMed

Deng, Jie; Li, Yan-Ting; Shen, Xu; Yu, Yi-Wen; Lin, Hui-Ling; Zhao, Qi-Feng; Yang, Tian-Ci; Li, Shu-Lian; Niu, Jian-Jun

2017-12-01

The aim of this study was to evaluate the risk factors for pneumonia due to extended-spectrum β-lactamase-producing Klebsiella pneumoniae (ESBL-KP) and to analyse the molecular epidemiology of ESBL-KP in Xiamen, China. A case-control study was conducted at Zhongshan Hospital from January 2014 to August 2015. Medical records of patients with nosocomial pneumonia caused by K. pneumoniae were collected. A total of 40 cases with ESBL-KP infection and 90 controls with non-ESBL-KP infection were included. The sequence types (STs) of the 40 ESBL-KP strains were determined by multilocus sequence typing (MLST). Univariate analysis primarily revealed an association between the following seven risk factors and ESBL-KP infection (P<0.10): length of hospitalisation; use of cephalosporins; use of quinolones; presence of a nasogastric tube; presence of an intravenous catheter; mechanical ventilation; and cerebrospinal fluid drainage. Furthermore, multivariate analysis revealed that use of cephalosporins and presence of a nasogastric tube were independent risk factors for ESBL-KP infection (P<0.05), with adjusted odds ratios of 3.473 [95% confidence interval (CI) 1.105-10.911; P=0.033] and 2.488 (95% CI 1.083-5.715; P=0.032), respectively. MLST identified 28 STs. The main STs were ST23 (10.0%) and ST37 (10.0%); three novel STs were identified. Use of cephalosporins and presence of a nasogastric tube are independent risk factors for ESBL-KP infection. In addition, the discovery of three novel STs serves as a reminder to continuously monitor outbreaks of ESBL-KP infection. Copyright © 2017. Published by Elsevier Ltd.
Genome-Wide Analysis of the RAV Family in Soybean and Functional Identification of GmRAV-03 Involvement in Salt and Drought Stresses and Exogenous ABA Treatment

PubMed Central

Zhao, Shu-Ping; Xu, Zhao-Shi; Zheng, Wei-Jun; Zhao, Wan; Wang, Yan-Xia; Yu, Tai-Fei; Chen, Ming; Zhou, Yong-Bin; Min, Dong-Hong; Ma, You-Zhi; Chai, Shou-Cheng; Zhang, Xiao-Hong

2017-01-01

Transcription factors play vital roles in plant growth and in plant responses to abiotic stresses. The RAV transcription factors contain a B3 DNA binding domain and/or an APETALA2 (AP2) DNA binding domain. Although genome-wide analyses of RAV family genes have been performed in several species, little is known about the family in soybean (Glycine max L.). In this study, a total of 13 RAV genes, named as GmRAVs, were identified in the soybean genome. We predicted and analyzed the amino acid compositions, phylogenetic relationships, and folding states of conserved domain sequences of soybean RAV transcription factors. These soybean RAV transcription factors were phylogenetically clustered into three classes based on their amino acid sequences. Subcellular localization analysis revealed that the soybean RAV proteins were located in the nucleus. The expression patterns of 13 RAV genes were analyzed by quantitative real-time PCR. Under drought stresses, the RAV genes expressed diversely, up- or down-regulated. Following NaCl treatments, all RAV genes were down-regulated excepting GmRAV-03 which was up-regulated. Under abscisic acid (ABA) treatment, the expression of all of the soybean RAV genes increased dramatically. These results suggested that the soybean RAV genes may be involved in diverse signaling pathways and may be responsive to abiotic stresses and exogenous ABA. Further analysis indicated that GmRAV-03 could increase the transgenic lines resistance to high salt and drought and result in the transgenic plants insensitive to exogenous ABA. This present study provides valuable information for understanding the classification and putative functions of the RAV transcription factors in soybean. PMID:28634481
Relationships between physical properties and sequence in silkworm silks

PubMed Central

Malay, Ali D.; Sato, Ryota; Yazawa, Kenjiro; Watanabe, Hiroe; Ifuku, Nao; Masunaga, Hiroyasu; Hikima, Takaaki; Guan, Juan; Mandal, Biman B.; Damrongsakkul, Siriporn; Numata, Keiji

2016-01-01

Silk has attracted widespread attention due to its superlative material properties and promising applications. However, the determinants behind the variations in material properties among different types of silk are not well understood. We analysed the physical properties of silk samples from a variety of silkmoth cocoons, including domesticated Bombyx mori varieties and several species from Saturniidae. Tensile deformation tests, thermal analyses, and investigations on crystalline structure and orientation of the fibres were performed. The results showed that saturniid silks produce more highly-defined structural transitions compared to B. mori, as seen in the yielding and strain hardening events during tensile deformation and in the changes observed during thermal analyses. These observations were analysed in terms of the constituent fibroin sequences, which in B. mori are predicted to produce heterogeneous structures, whereas the strictly modular repeats of the saturniid sequences are hypothesized to produce structures that respond in a concerted manner. Within saturniid fibroins, thermal stability was found to correlate with the abundance of poly-alanine residues, whereas differences in fibre extensibility can be related to varying ratios of GGX motifs versus bulky hydrophobic residues in the amorphous phase. PMID:27279149
Relationships between physical properties and sequence in silkworm silks

NASA Astrophysics Data System (ADS)

Malay, Ali D.; Sato, Ryota; Yazawa, Kenjiro; Watanabe, Hiroe; Ifuku, Nao; Masunaga, Hiroyasu; Hikima, Takaaki; Guan, Juan; Mandal, Biman B.; Damrongsakkul, Siriporn; Numata, Keiji

2016-06-01

Silk has attracted widespread attention due to its superlative material properties and promising applications. However, the determinants behind the variations in material properties among different types of silk are not well understood. We analysed the physical properties of silk samples from a variety of silkmoth cocoons, including domesticated Bombyx mori varieties and several species from Saturniidae. Tensile deformation tests, thermal analyses, and investigations on crystalline structure and orientation of the fibres were performed. The results showed that saturniid silks produce more highly-defined structural transitions compared to B. mori, as seen in the yielding and strain hardening events during tensile deformation and in the changes observed during thermal analyses. These observations were analysed in terms of the constituent fibroin sequences, which in B. mori are predicted to produce heterogeneous structures, whereas the strictly modular repeats of the saturniid sequences are hypothesized to produce structures that respond in a concerted manner. Within saturniid fibroins, thermal stability was found to correlate with the abundance of poly-alanine residues, whereas differences in fibre extensibility can be related to varying ratios of GGX motifs versus bulky hydrophobic residues in the amorphous phase.
Molecular phylogenetic analysis of non-sexually transmitted strains of Haemophilus ducreyi.

PubMed

Gaston, Jordan R; Roberts, Sally A; Humphreys, Tricia L

2015-01-01

Haemophilus ducreyi, the etiologic agent of chancroid, has been previously reported to show genetic variance in several key virulence factors, placing strains of the bacterium into two genetically distinct classes. Recent studies done in yaws-endemic areas of the South Pacific have shown that H. ducreyi is also a major cause of cutaneous limb ulcers (CLU) that are not sexually transmitted. To genetically assess CLU strains relative to the previously described class I, class II phylogenetic hierarchy, we examined nucleotide sequence diversity at 11 H. ducreyi loci, including virulence and housekeeping genes, which encompass approximately 1% of the H. ducreyi genome. Sequences for all 11 loci indicated that strains collected from leg ulcers exhibit DNA sequences homologous to class I strains of H. ducreyi. However, sequences for 3 loci, including a hemoglobin receptor (hgbA), serum resistance protein (dsrA), and a collagen adhesin (ncaA) contained informative amounts of variation. Phylogenetic analyses suggest that these non-sexually transmitted strains of H. ducreyi comprise a sub-clonal population within class I strains of H. ducreyi. Molecular dating suggests that CLU strains are the most recently developed, having diverged approximately 0.355 million years ago, fourteen times more recently than the class I/class II divergence. The CLU strains' divergence falls after the divergence of humans from chimpanzees, making it the first known H. ducreyi divergence event directly influenced by the selective pressures accompanying human hosts.
Genome sequencing of the extinct Eurasian wild aurochs illuminates the phylogeography and evolution of cattle

USDA-ARS?s Scientific Manuscript database

Interrogation of modern and ancient bovine genome sequences provides a valuable model to study the evolution of cattle. Here, we analyse the first complete wild aurochs (Bos primigenius) genome sequence using DNA extracted from a ~ 6,750 year-old humerus bone retrieved from a cave site in Derbyshire...
Repair Sequences in Dysarthric Conversational Speech: A Study in Interactional Phonetics

ERIC Educational Resources Information Center

Rutter, Ben

2009-01-01

This paper presents some findings from a case study of repair sequences in conversations between a dysarthric speaker, Chris, and her interactional partners. It adopts the methodology of interactional phonetics, where turn design, sequence organization, and variation in phonetic parameters are analysed in unison. The analysis focused on the use of…
Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis

USDA-ARS?s Scientific Manuscript database

In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T formed a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these ot...
PET-Tool: a software suite for comprehensive processing and managing of Paired-End diTag (PET) sequence data.

PubMed

Chiu, Kuo Ping; Wong, Chee-Hong; Chen, Qiongyu; Ariyaratne, Pramila; Ooi, Hong Sain; Wei, Chia-Lin; Sung, Wing-Kin Ken; Ruan, Yijun

2006-08-25

We recently developed the Paired End diTag (PET) strategy for efficient characterization of mammalian transcriptomes and genomes. The paired end nature of short PET sequences derived from long DNA fragments raised a new set of bioinformatics challenges, including how to extract PETs from raw sequence reads, and correctly yet efficiently map PETs to reference genome sequences. To accommodate and streamline data analysis of the large volume PET sequences generated from each PET experiment, an automated PET data process pipeline is desirable. We designed an integrated computation program package, PET-Tool, to automatically process PET sequences and map them to the genome sequences. The Tool was implemented as a web-based application composed of four modules: the Extractor module for PET extraction; the Examiner module for analytic evaluation of PET sequence quality; the Mapper module for locating PET sequences in the genome sequences; and the Project Manager module for data organization. The performance of PET-Tool was evaluated through the analyses of 2.7 million PET sequences. It was demonstrated that PET-Tool is accurate and efficient in extracting PET sequences and removing artifacts from large volume dataset. Using optimized mapping criteria, over 70% of quality PET sequences were mapped specifically to the genome sequences. With a 2.4 GHz LINUX machine, it takes approximately six hours to process one million PETs from extraction to mapping. The speed, accuracy, and comprehensiveness have proved that PET-Tool is an important and useful component in PET experiments, and can be extended to accommodate other related analyses of paired-end sequences. The Tool also provides user-friendly functions for data quality check and system for multi-layer data management.
Using SQL Databases for Sequence Similarity Searching and Analysis.

PubMed

Pearson, William R; Mackey, Aaron J

2017-09-13

Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Novel Primer Sets for Next Generation Sequencing-Based Analyses of Water Quality

PubMed Central

Lee, Elvina; Khurana, Maninder S.; Whiteley, Andrew S.; Monis, Paul T.; Bath, Andrew; Gordon, Cameron; Ryan, Una M.; Paparini, Andrea

2017-01-01

Next generation sequencing (NGS) has rapidly become an invaluable tool for the detection, identification and relative quantification of environmental microorganisms. Here, we demonstrate two new 16S rDNA primer sets, which are compatible with NGS approaches and are primarily for use in water quality studies. Compared to 16S rRNA gene based universal primers, in silico and experimental analyses demonstrated that the new primers showed increased specificity for the Cyanobacteria and Proteobacteria phyla, allowing increased sensitivity for the detection, identification and relative quantification of toxic bloom-forming microalgae, microbial water quality bioindicators and common pathogens. Significantly, Cyanobacterial and Proteobacterial sequences accounted for ca. 95% of all sequences obtained within NGS runs (when compared to ca. 50% with standard universal NGS primers), providing higher sensitivity and greater phylogenetic resolution of key water quality microbial groups. The increased selectivity of the new primers allow the parallel sequencing of more samples through reduced sequence retrieval levels required to detect target groups, potentially reducing NGS costs by 50% but still guaranteeing optimal coverage and species discrimination. PMID:28118368
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs.

PubMed

Holmes, Roger S; Goldberg, Erwin

2009-10-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs

PubMed Central

Holmes, Roger S; Goldberg, Erwin

2009-01-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals. PMID:19679512
A comparative analysis of exome capture.

PubMed

Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard

2011-09-29

Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.
Towards decoding the conifer giga-genome.

PubMed

Mackay, John; Dean, Jeffrey F D; Plomion, Christophe; Peterson, Daniel G; Cánovas, Francisco M; Pavy, Nathalie; Ingvarsson, Pär K; Savolainen, Outi; Guevara, M Ángeles; Fluch, Silvia; Vinceti, Barbara; Abarca, Dolores; Díaz-Sala, Carmen; Cervera, María-Teresa

2012-12-01

Several new initiatives have been launched recently to sequence conifer genomes including pines, spruces and Douglas-fir. Owing to the very large genome sizes ranging from 18 to 35 gigabases, sequencing even a single conifer genome had been considered unattainable until the recent throughput increases and cost reductions afforded by next generation sequencers. The purpose of this review is to describe the context for these new initiatives. A knowledge foundation has been acquired in several conifers of commercial and ecological interest through large-scale cDNA analyses, construction of genetic maps and gene mapping studies aiming to link phenotype and genotype. Exploratory sequencing in pines and spruces have pointed out some of the unique properties of these giga-genomes and suggested strategies that may be needed to extract value from their sequencing. The hope is that recent and pending developments in sequencing technology will contribute to rapidly filling the knowledge vacuum surrounding their structure, contents and evolution. Researchers are also making plans to use comparative analyses that will help to turn the data into a valuable resource for enhancing and protecting the world's conifer forests.
Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

NASA Astrophysics Data System (ADS)

Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

2012-10-01

In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.

Prevalence of transcription factors in ascomycete and basidiomycete fungi

PubMed Central

2014-01-01

Background Gene regulation underlies fungal physiology and therefore is a major factor in fungal biodiversity. Analysis of genome sequences has revealed a large number of putative transcription factors in most fungal genomes. The presence of fungal orthologs for individual regulators has been analysed and appears to be highly variable with some regulators widely conserved and others showing narrow distribution. Although genome-scale transcription factor surveys have been performed before, no global study into the prevalence of specific regulators across the fungal kingdom has been presented. Results In this study we have analysed the number of members for 37 regulator classes in 77 ascomycete and 31 basidiomycete fungal genomes and revealed significant differences between ascomycetes and basidiomycetes. In addition, we determined the presence of 64 regulators characterised in ascomycetes across these 108 genomes. This demonstrated that overall the highest presence of orthologs is in the filamentous ascomycetes. A significant number of regulators lacked orthologs in the ascomycete yeasts and the basidiomycetes. Conversely, of seven basidiomycete regulators included in the study, only one had orthologs in ascomycetes. Conclusions This study demonstrates a significant difference in the regulatory repertoire of ascomycete and basidiomycete fungi, at the level of both regulator class and individual regulator. This suggests that the current regulatory systems of these fungi have been mainly developed after the two phyla diverged. Most regulators detected in both phyla are involved in central functions of fungal physiology and therefore were likely already present in the ancestor of the two phyla. PMID:24650355
Next Generation Sequencing Technology and Genomewide Data Analysis: Perspectives for Retinal Research

PubMed Central

Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand

2016-01-01

The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Genotypic distribution of a specialist model microorganism, Methanosaeta, along an estuarine gradient: does metabolic restriction limit niche differentiation potential?

PubMed

Carbonero, Franck; Oakley, Brian B; Hawkins, Robert J; Purdy, Kevin J

2012-05-01

A reductionist ecological approach of using a model genus was adopted in order to understand how microbial community structure is driven by metabolic properties. The distribution along an estuarine gradient of the highly specialised genus Methanosaeta was investigated and compared to the previously determined distribution of the more metabolically flexible Desulfobulbus. Methanosaeta genotypic distribution along the Colne estuary (Essex, UK) was determined by DNA- and RNA-based denaturing gradient gel electrophoresis and 16S rRNA gene sequence analyses. Methanosaeta distribution was monotonic, with a consistently diverse community and no apparent niche partitioning either in DNA or RNA analyses. This distribution pattern contrasts markedly with the previously described niche partitioning and sympatric differentiation of the model generalist, Desulfobulbus. To explain this difference, it is hypothesised that Methanosaeta's strict metabolic needs limit its adaptation potential, thus populations do not partition into spatially distinct groups and so do not appear to be constrained by gross environmental factors such as salinity. Thus, at least for these two model genera, it appears that metabolic flexibility may be an important factor in spatial distribution and this may be applicable to other microbes.
Short RNA indicator sequences are not completely degraded by autoclaving

PubMed Central

Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.

2014-01-01

Short indicator RNA sequences (<100 bp) persist after autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856
A weighted U-statistic for genetic association analyses of sequencing data.

PubMed

Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

2014-12-01

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

PubMed Central

Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

2016-01-01

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

PubMed

Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

2016-12-23

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
NG6: Integrated next generation sequencing storage and processing environment.

PubMed

Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

2012-09-09

Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
Genome-wide diversity and selective pressure in the human rhinovirus

PubMed Central

Kistler, Amy L; Webster, Dale R; Rouskin, Silvi; Magrini, Vince; Credle, Joel J; Schnurr, David P; Boushey, Homer A; Mardis, Elaine R; Li, Hao; DeRisi, Joseph L

2007-01-01

Background The human rhinoviruses (HRV) are one of the most common and diverse respiratory pathogens of humans. Over 100 distinct HRV serotypes are known, yet only 6 genomes are available. Due to the paucity of HRV genome sequence, little is known about the genetic diversity within HRV or the forces driving this diversity. Previous comparative genome sequence analyses indicate that recombination drives diversification in multiple genera of the picornavirus family, yet it remains unclear if this holds for HRV. Results To resolve this and gain insight into the forces driving diversification in HRV, we generated a representative set of 34 fully sequenced HRVs. Analysis of these genomes shows consistent phylogenies across the genome, conserved non-coding elements, and only limited recombination. However, spikes of genetic diversity at both the nucleotide and amino acid level are detectable within every locus of the genome. Despite this, the HRV genome as a whole is under purifying selective pressure, with islands of diversifying pressure in the VP1, VP2, and VP3 structural genes and two non-structural genes, the 3C protease and 3D polymerase. Mapping diversifying residues in these factors onto available 3-dimensional structures revealed the diversifying capsid residues partition to the external surface of the viral particle in statistically significant proximity to antigenic sites. Diversifying pressure in the pleconaril binding site is confined to a single residue known to confer drug resistance (VP1 191). In contrast, diversifying pressure in the non-structural genes is less clear, mapping both nearby and beyond characterized functional domains of these factors. Conclusion This work provides a foundation for understanding HRV genetic diversity and insight into the underlying biology driving evolution in HRV. It expands our knowledge of the genome sequence space that HRV reference serotypes occupy and how the pattern of genetic diversity across HRV genomes differs from other picornaviruses. It also reveals evidence of diversifying selective pressure in both structural genes known to interact with the host immune system and in domains of unassigned function in the non-structural 3C and 3D genes, raising the possibility that diversification of undiscovered functions in these essential factors may influence HRV fitness and evolution. PMID:17477878
Sequence selective capture, release and analysis of DNA using a magnetic microbead-assisted toehold-mediated DNA strand displacement reaction.

PubMed

Khodakov, Dmitriy A; Khodakova, Anastasia S; Linacre, Adrian; Ellis, Amanda V

2014-07-21

This paper reports on the modification of magnetic beads with oligonucleotide capture probes with a specially designed pendant toehold (overhang) aimed specifically to capture double-stranded PCR products. After capture, the PCR products were selectively released from the magnetic beads by means of a toehold-mediated strand displacement reaction using short artificial oligonucleotide triggers and analysed using capillary electrophoresis. The approach was successfully shown on two genes widely used in human DNA genotyping, namely human c-fms (macrophage colony-stimulating factor) proto-oncogene for the CSF-1 receptor (CSF1PO) and amelogenin.
Examining regional variability in work ethic within Mexico: Individual difference or shared value.

PubMed

Arciniega, Luis M; Woehr, David J; Del Rincón, Germán A

2018-02-19

Despite the acceptance of work ethic as an important individual difference, little research has examined the extent to which work ethic may reflect shared environmental or socio-economic factors. This research addresses this concern by examining the influence of geographic proximity on the work ethic experienced by 254 employees from Mexico, working in 11 different cities in the Northern, Central and Southern regions of the country. Using a sequence of complementary analyses to assess the main source of variance on seven dimensions of work ethic, our results indicate that work ethic is most appropriately considered at the individual level. © 2018 International Union of Psychological Science.
The PLAID graphics analysis impact on the space program

NASA Technical Reports Server (NTRS)

Nguyen, Jennifer P.; Wheaton, Aneice L.; Maida, James C.

1994-01-01

An ongoing project design often requires visual verification at various stages. These requirements are critically important because the subsequent phases of that project might depend on the complete verification of a particular stage. Currently, there are several software packages at JSC that provide such simulation capabilities. We present the simulation capabilities of the PLAID modeling system used in the Flight Crew Support Division for human factors analyses. We summarize some ongoing studies in kinematics, lighting, EVA activities, and discuss various applications in the mission planning of the current Space Shuttle flights and the assembly sequence of the Space Station Freedom with emphasis on the redesign effort.
Initial Reduction of CO2 on Pd-, Ru-, and Cu-Doped CeO2(111) Surfaces: Effects of Surface Modification on Catalytic Activity and Selectivity.

PubMed

Guo, Chen; Wei, Shuxian; Zhou, Sainan; Zhang, Tian; Wang, Zhaojie; Ng, Siu-Pang; Lu, Xiaoqing; Wu, Chi-Man Lawrence; Guo, Wenyue

2017-08-09

Surface modification by metal doping is an effective treatment technique for improving surface properties for CO 2 reduction. Herein, the effects of doped Pd, Ru, and Cu on the adsorption, activation, and reduction selectivity of CO 2 on CeO 2 (111) were investigated by periodic density functional theory. The doped metals distorted the configuration of a perfect CeO 2 (111) by weakening the adjacent Ce-O bond strength, and Pd doping was beneficial for generating a highly active O vacancy. The analyses of adsorption energy, charge density difference, and density of states confirmed that the doped metals were conducive for enhancing CO 2 adsorption, especially for Cu/CeO 2 (111). The initial reductive dissociation CO 2 → CO* + O* on metal-doped CeO 2 (111) followed the sequence of Cu- > perfect > Pd- > Ru-doped CeO 2 (111); the reductive hydrogenation CO 2 + H → COOH* followed the sequence of Cu- > perfect > Ru- > Pd-doped CeO 2 (111), in which the most competitive route on Cu/CeO 2 (111) was exothermic by 0.52 eV with an energy barrier of 0.16 eV; the reductive hydrogenation CO 2 + H → HCOO* followed the sequence of Ru- > perfect > Pd-doped CeO 2 (111). Energy barrier decomposition analyses were performed to identify the governing factors of bond activation and scission along the initial CO 2 reduction routes. Results of this study provided deep insights into the effect of surface modification on the initial reduction mechanisms of CO 2 on metal-doped CeO 2 (111) surfaces.
Metschnikowia drakensbergensis sp. nov. and Metschnikowia caudata sp. nov., endemic yeasts associated with Protea flowers in South Africa.

PubMed

de Vega, Clara; Guzmán, Beatriz; Steenhuisen, Sandy-Lynn; Johnson, Steven D; Herrera, Carlos M; Lachance, Marc-André

2014-11-01

In a taxonomic study of yeasts recovered from nectar of flowers and associated insects in South Africa, 11 strains were found to represent two novel species. Morphological and physiological characteristics and sequence analyses of the large-subunit rRNA gene D1/D2 region, as well as the actin, RNA polymerase II and elongation factor 2 genes, showed that the two novel species belonged to the genus Metschnikowia. Metschnikowia drakensbergensis sp. nov. (type strain EBD-CdVSA09-2(T) =CBS 13649(T) =NRRL Y-63721(T); MycoBank no. MB809688; allotype EBD-CdVSA10-2(A) =CBS13650(A) =NRRL Y-63720(A)) was recovered from nectar of Protea roupelliae and the beetle Heterochelus sp. This species belongs to the large-spored Metschnikowia clade and is closely related to Metschnikowia proteae, with which mating reactions and single-spored asci were observed. Metschnikowia caudata sp. nov. (type strain EBD-CdVSA08-1(T) =CBS 13651(T) =NRRL Y-63722(T); MycoBank no. MB809689; allotype EBD-CdVSA57-2(A) =CBS 13729(A) =NRRL Y-63723(A)) was isolated from nectar of Protea dracomontana, P. roupelliae and P. subvestita and a honeybee, and is a sister species to Candida hainanensis and Metschnikowia lopburiensis. Analyses of the four sequences demonstrated the existence of three separate phylotypes. Intraspecies matings led to the production of mature asci of unprecedented morphology, with a long, flexuous tail. A single ascospore was produced in all compatible crosses, regardless of sequence phylotype. The two species appear to be endemic to South Africa. The ecology and habitat specificity of these novel species are discussed in terms of host plant and insect host species. © 2014 IUMS.
A genomic survey of the fish parasite Spironucleus salmonicida indicates genomic plasticity among diplomonads and significant lateral gene transfer in eukaryote genome evolution

PubMed Central

Andersson, Jan O; Sjögren, Åsa M; Horner, David S; Murphy, Colleen A; Dyal, Patricia L; Svärd, Staffan G; Logsdon, John M; Ragan, Mark A; Hirt, Robert P; Roger, Andrew J

2007-01-01

Background Comparative genomic studies of the mitochondrion-lacking protist group Diplomonadida (diplomonads) has been lacking, although Giardia lamblia has been intensively studied. We have performed a sequence survey project resulting in 2341 expressed sequence tags (EST) corresponding to 853 unique clones, 5275 genome survey sequences (GSS), and eleven finished contigs from the diplomonad fish parasite Spironucleus salmonicida (previously described as S. barkhanus). Results The analyses revealed a compact genome with few, if any, introns and very short 3' untranslated regions. Strikingly different patterns of codon usage were observed in genes corresponding to frequently sampled ESTs versus genes poorly sampled, indicating that translational selection is influencing the codon usage of highly expressed genes. Rigorous phylogenomic analyses identified 84 genes – mostly encoding metabolic proteins – that have been acquired by diplomonads or their relatively close ancestors via lateral gene transfer (LGT). Although most acquisitions were from prokaryotes, more than a dozen represent likely transfers of genes between eukaryotic lineages. Many genes that provide novel insights into the genetic basis of the biology and pathogenicity of this parasitic protist were identified including 149 that putatively encode variant-surface cysteine-rich proteins which are candidate virulence factors. A number of genomic properties that distinguish S. salmonicida from its human parasitic relative G. lamblia were identified such as nineteen putative lineage-specific gene acquisitions, distinct mutational biases and codon usage and distinct polyadenylation signals. Conclusion Our results highlight the power of comparative genomic studies to yield insights into the biology of parasitic protists and the evolution of their genomes, and suggest that genetic exchange between distantly-related protist lineages may be occurring at an appreciable rate in eukaryote genome evolution. PMID:17298675
Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

PubMed Central

Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

2012-01-01

High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309
Genetic analysis of duck circovirus in Pekin ducks from South Korea.

PubMed

Cha, S-Y; Kang, M; Cho, J-G; Jang, H-K

2013-11-01

The genetic organization of the 24 duck circovirus (DuCV) strains detected in commercial Pekin ducks from South Korea between 2011 and 2012 is described in this study. Multiple sequence alignment and phylogenetic analyses were performed on the 24 viral genome sequences as well as on 45 genome sequences available from the GenBank database. Phylogenetic analyses based on the genomic and open reading frame 2/cap sequences demonstrated that all DuCV strains belonged to genotype 1 and were designated in a subcluster under genotype 1. Analysis of the capsid protein amino acid sequences of the 24 Korean DuCV strains showed 10 substitutions compared with that of other genotype 1 strains. Our analysis showed that genotype 1 is predominant and circulating in South Korea. These present results serve as incentive to add more data to the DuCV database and provide insight to conduct further intensive study on the geographic relationships among these virus strains.
Genome sequencing and transcriptome analysis of Trichoderma reesei QM9978 strain reveals a distal chromosome translocation to be responsible for loss of vib1 expression and loss of cellulase induction.

PubMed

Ivanova, Christa; Ramoni, Jonas; Aouam, Thiziri; Frischmann, Alexa; Seiboth, Bernhard; Baker, Scott E; Le Crom, Stéphane; Lemoine, Sophie; Margeot, Antoine; Bidard, Frédérique

2017-01-01

The hydrolysis of biomass to simple sugars used for the production of biofuels in biorefineries requires the action of cellulolytic enzyme mixtures. During the last 50 years, the ascomycete Trichoderma reesei , the main source of industrial cellulase and hemicellulase cocktails, has been subjected to several rounds of classical mutagenesis with the aim to obtain higher production levels. During these random genetic events, strains unable to produce cellulases were generated. Here, whole genome sequencing and transcriptomic analyses of the cellulase-negative strain QM9978 were used for the identification of mutations underlying this cellulase-negative phenotype. Sequence comparison of the cellulase-negative strain QM9978 to the reference strain QM6a identified a total of 43 mutations, of which 33 were located either close to or in coding regions. From those, we identified 23 single-nucleotide variants, nine InDels, and one translocation. The translocation occurred between chromosomes V and VII, is located upstream of the putative transcription factor vib1 , and abolishes its expression in QM9978 as detected during the transcriptomic analyses. Ectopic expression of vib1 under the control of its native promoter as well as overexpression of vib1 under the control of a strong constitutive promoter restored cellulase expression in QM9978, thus confirming that the translocation event is the reason for the cellulase-negative phenotype. Gene deletion of vib1 in the moderate producer strain QM9414 and in the high producer strain Rut-C30 reduced cellulase expression in both cases. Overexpression of vib1 in QM9414 and Rut-C30 had no effect on cellulase production, most likely because vib1 is already expressed at an optimal level under normal conditions. We were able to establish a link between a chromosomal translocation in QM9978 and the cellulase-negative phenotype of the strain. We identified the transcription factor vib1 as a key regulator of cellulases in T. reesei whose expression is absent in QM9978. We propose that in T. reesei , as in Neurospora crassa , vib1 is involved in cellulase induction, although the exact mechanism remains to be elucidated. The data presented here show an example of a combined genome sequencing and transcriptomic approach to explain a specific trait, in this case the QM9978 cellulase-negative phenotype, and how it helps to better understand the mechanisms during cellulase gene regulation. When focusing on mutations on the single base-pair level, changes on the chromosome level can be easily overlooked and through this work we provide an example that stresses the importance of the big picture of the genomic landscape during analysis of sequencing data.
mySyntenyPortal: an application package to construct websites for synteny block analysis.

PubMed

Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum

2018-06-05

Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.
MicroRNA-21 promotes proliferation of rat hepatocyte BRL-3A by targeting FASLG.

PubMed

Li, J J; Chan, W H; Leung, W Y; Wang, Y; Xu, C S

2015-04-27

Rat liver regeneration (RLR) induced by partial hepatectomy involves cell proliferation regulated by numerous factors, including microRNAs (miRNAs). miRNA high-throughput sequencing has been established and used to analyze miRNA expression profiles. This study showed that 39 miRNAs were related to RLR through the analysis of miRNA high-throughput sequencing. Their role toward rat normal hepatocyte line BRL-3A was studied by gain- and loss-of-function analyses, and one of them, microRNA-21 (miR-21), obviously upregulated and promoted BRL-3A cell proliferation. Using bioinformatics to search for miR-21 targets revealed that Fas ligand (FASLG) is one of miR-21's target genes. A dual-luciferase report assay and Western blot assay showed that miR-21 directly targeted the 3'-untranslated region of FASLG and inhibited the expression of FASLG, which suggests that miR-21 promoted BRL-3A cell proliferation by reducing FASLG expression.

Polyclonal emergence of vanA vancomycin-resistant Enterococcus faecium in Australia.

PubMed

van Hal, Sebastiaan J; Espedido, Björn A; Coombs, Geoffrey W; Howden, Benjamin P; Korman, Tony M; Nimmo, Graeme R; Gosbell, Iain B; Jensen, Slade O

2017-04-01

To investigate the genetic context associated with the emergence of vanA VRE in Australia. The whole genomes of 18 randomly selected vanA -positive Enterococcus faecium patient isolates, collected between 2011 and 2013 from hospitals in four Australian capitals, were sequenced and analysed. In silico typing and transposon/plasmid assembly revealed that the sequenced isolates represented (in most cases) different hospital-adapted STs and were associated with a variety of different Tn 1546 variants and plasmid backbone structures. The recent emergence of vanA VRE in Australia was polyclonal and not associated with the dissemination of a single 'dominant' ST or vanA -encoding plasmid. Interestingly, the factors contributing to this epidemiological change are not known and future studies may need to consider investigation of potential community sources. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Two new hyaline-ascospored species of Trichoderma and their phylogenetic positions.

PubMed

Qin, W T; Zhuang, W Y

2016-01-01

Collections of hypocrealean fungi found on decaying wood in subtropical regions of China were examined. Two new species, Trichoderma confluens and T. hubeiense, were discovered and are described. Trichoderma confluens is characterized by its widely effuse to rarely pulvinate, yellow stromata with densely disposed yellowish brown ostioles, simple acremonium- to verticillium-like conidiophores, hyaline conidia and multiform chlamydospores. Trichoderma hubeiense has pulvinate, grayish yellow stromata with brownish ostioles, trichoderma- to verticillium-like conidiophores and hyaline conidia. The phylogenetic positions of the two fungi were investigated based on sequence analyses of RNA polymerase II subunit b and translation elongation factor 1-α genes. The results indicate that T. confluens belongs to the Hypocreanum clade and is associated with but clearly separated from T. applanatum and T. decipiens. Trichoderma hubeiense belongs to the Polysporum clade and related to T. bavaricum but obviously differs from other members of the clade in sequence data. Morphological distinctions between the new species and their close relatives are noted and discussed. © 2016 by The Mycological Society of America.
Pseudomonas aeruginosa Type III Secretory Toxin ExoU and Its Predicted Homologs.

PubMed

Sawa, Teiji; Hamaoka, Saeko; Kinoshita, Mao; Kainuma, Atsushi; Naito, Yoshifumi; Akiyama, Koichi; Kato, Hideya

2016-10-26

Pseudomonas aeruginosa ExoU, a type III secretory toxin and major virulence factor with patatin-like phospholipase activity, is responsible for acute lung injury and sepsis in immunocompromised patients. Through use of a recently updated bacterial genome database, protein sequences predicted to be homologous to Ps. aeruginosa ExoU were identified in 17 other Pseudomonas species ( Ps. fluorescens , Ps. lundensis , Ps. weihenstephanensis , Ps. marginalis, Ps. rhodesiae, Ps. synxantha , Ps. libanensis , Ps. extremaustralis , Ps. veronii , Ps. simiae , Ps. trivialis , Ps. tolaasii , Ps. orientalis , Ps. taetrolens , Ps. syringae , Ps. viridiflava , and Ps. cannabina ) and 8 Gram-negative bacteria from three other genera ( Photorhabdus , Aeromonas , and Paludibacterium ). In the alignment of the predicted primary amino acid sequences used for the phylogenetic analyses, both highly conserved and nonconserved parts of the toxin were discovered among the various species. Further comparative studies of the predicted ExoU homologs should provide us with more detailed information about the unique characteristics of the Ps. aeruginosa ExoU toxin.
Structure and function of homodomain-leucine zipper (HD-Zip) proteins.

PubMed

Elhiti, Mohamed; Stasolla, Claudio

2009-02-01

Homeodomain-leucine zipper (HD-Zip) proteins are transcription factors unique to plants and are encoded by more than 25 genes in Arabidopsis thaliana. Based on sequence analyses these proteins have been classified into four distinct groups: HD-Zip I-IV. HD-Zip proteins are characterized by the presence of two functional domains; a homeodomain (HD) responsible for DNA binding and a leucine zipper domain (Zip) located immediately C-terminal to the homeodomain and involved in protein-protein interaction. Despite sequence similarities HD-ZIP proteins participate in a variety of processes during plant growth and development. HD-Zip I proteins are generally involved in responses related to abiotic stress, abscisic acid (ABA), blue light, de-etiolation and embryogenesis. HD-Zip II proteins participate in light response, shade avoidance and auxin signalling. Members of the third group (HD-Zip III) control embryogenesis, leaf polarity, lateral organ initiation and meristem function. HD-Zip IV proteins play significant roles during anthocyanin accumulation, differentiation of epidermal cells, trichome formation and root development.
Bacterial community compositions of coking wastewater treatment plants in steel industry revealed by Illumina high-throughput sequencing.

PubMed

Ma, Qiao; Qu, Yuanyuan; Shen, Wenli; Zhang, Zhaojing; Wang, Jingwei; Liu, Ziyan; Li, Duanxing; Li, Huijie; Zhou, Jiti

2015-03-01

In this study, Illumina high-throughput sequencing was used to reveal the community structures of nine coking wastewater treatment plants (CWWTPs) in China for the first time. The sludge systems exhibited a similar community composition at each taxonomic level. Compared to previous studies, some of the core genera in municipal wastewater treatment plants such as Zoogloea, Prosthecobacter and Gp6 were detected as minor species. Thiobacillus (20.83%), Comamonas (6.58%), Thauera (4.02%), Azoarcus (7.78%) and Rhodoplanes (1.42%) were the dominant genera shared by at least six CWWTPs. The percentages of autotrophic ammonia-oxidizing bacteria and nitrite-oxidizing bacteria were unexpectedly low, which were verified by both real-time PCR and fluorescence in situ hybridization analyses. Hierarchical clustering and canonical correspondence analysis indicated that operation mode, flow rate and temperature might be the key factors in community formation. This study provides new insights into our understanding of microbial community compositions and structures of CWWTPs. Copyright © 2014 Elsevier Ltd. All rights reserved.
Cantharellus violaceovinosus, a new species from tropical Quercus forests in eastern Mexico

PubMed Central

Herrera, Mariana; Bandala, Victor M.; Montoya, Leticia

2018-01-01

Abstract During explorations of tropical oak forests in central Veracruz (eastern Mexico), the authors discovered a Cantharellus species that produces basidiomes with strikingly violet pileus and a hymenium with yellow, raised gill-like folds. It is harvested locally and valued as a prized edible wild mushroom. Systematic multiyear sampling of basidiomes allowed the recording of the morphological variation exhibited by fresh fruit bodies in different growth stages, which supports the recognition of this Cantharellus species from others in the genus. Two molecular phylogenetic analyses based on a set of sequences of species of all major clades in Cantharellus, one including sequences of the transcription elongation factor 1-alpha (tef-1α) and a combined tef-1α and nLSU region (the large subunit of the ribosome), confirm the isolated position of the new species in a clade close to C. lewisii from USA, in the subgenus Cantharellus. Detailed macroscopic and microscopic descriptions, accompanied by illustrations and a taxonomic discussion are presented. PMID:29681739
Cantharellus violaceovinosus, a new species from tropical Quercus forests in eastern Mexico.

PubMed

Herrera, Mariana; Bandala, Victor M; Montoya, Leticia

2018-01-01

During explorations of tropical oak forests in central Veracruz (eastern Mexico), the authors discovered a Cantharellus species that produces basidiomes with strikingly violet pileus and a hymenium with yellow, raised gill-like folds. It is harvested locally and valued as a prized edible wild mushroom. Systematic multiyear sampling of basidiomes allowed the recording of the morphological variation exhibited by fresh fruit bodies in different growth stages, which supports the recognition of this Cantharellus species from others in the genus. Two molecular phylogenetic analyses based on a set of sequences of species of all major clades in Cantharellus , one including sequences of the transcription elongation factor 1-alpha (tef-1α) and a combined tef-1α and nLSU region (the large subunit of the ribosome), confirm the isolated position of the new species in a clade close to C. lewisii from USA, in the subgenus Cantharellus. Detailed macroscopic and microscopic descriptions, accompanied by illustrations and a taxonomic discussion are presented.
The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences

PubMed Central

Portales-Casamar, Elodie; Arenillas, David; Lim, Jonathan; Swanson, Magdalena I.; Jiang, Steven; McCallum, Anthony; Kirov, Stefan; Wasserman, Wyeth W.

2009-01-01

The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein–DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data ‘boutiques’ within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk. PMID:18971253
Analysis, Characterization, and Loci of the tuf Genes in Lactobacillus and Bifidobacterium Species and Their Direct Application for Species Identification

PubMed Central

Ventura, Marco; Canchaya, Carlos; Meylan, Valèrie; Klaenhammer, Todd R.; Zink, Ralf

2003-01-01

We analyzed the tuf gene, encoding elongation factor Tu, from 33 strains representing 17 Lactobacillus species and 8 Bifidobacterium species. The tuf sequences were aligned and used to infer phylogenesis among species of lactobacilli and bifidobacteria. We demonstrated that the synonymous substitution affecting this gene renders elongation factor Tu a reliable molecular clock for investigating evolutionary distances of lactobacilli and bifidobacteria. In fact, the phylogeny generated by these tuf sequences is consistent with that derived from 16S rRNA analysis. The investigation of a multiple alignment of tuf sequences revealed regions conserved among strains belonging to the same species but distinct from those of other species. PCR primers complementary to these regions allowed species-specific identification of closely related species, such as Lactobacillus casei group members. These tuf gene-based assays developed in this study provide an alternative to present methods for the identification for lactic acid bacterial species. Since a variable number of tuf genes have been described for bacteria, the presence of multiple genes was examined. Southern analysis revealed one tuf gene in the genomes of lactobacilli and bifidobacteria, but the tuf gene was arranged differently in the genomes of these two taxa. Our results revealed that the tuf gene in bifidobacteria is flanked by the same gene constellation as the str operon, as originally reported for Escherichia coli. In contrast, bioinformatic and transcriptional analyses of the DNA region flanking the tuf gene in four Lactobacillus species indicated the same four-gene unit and suggested a novel tuf operon specific for the genus Lactobacillus. PMID:14602655
Deep sequencing leads to the identification of eukaryotic translation initiation factor 5A as a key element in Rsv1-mediated lethal systemic hypersensitive response to Soybean mosaic virus infection in soybean.

PubMed

Chen, Hui; Adam Arsovski, Andrej; Yu, Kangfu; Wang, Aiming

2017-04-01

Rsv1, a single dominant resistance locus in soybean, confers extreme resistance to the majority of Soybean mosaic virus (SMV) strains, but is susceptible to the G7 strain. In Rsv1-genotype soybean, G7 infection provokes a lethal systemic hypersensitive response (LSHR), a delayed host defence response. The Rsv1-mediated LSHR signalling pathway remains largely unknown. In this study, we employed a genome-wide investigation to gain an insight into the molecular interplay between SMV G7 and Rsv1-genotype soybean. Small RNA (sRNA), degradome and transcriptome sequencing analyses were used to identify differentially expressed genes (DEGs) and microRNAs (DEMs) in response to G7 infection. A number of DEGs, DEMs and microRNA targets, and the interaction network of DEMs and their target mRNAs responsive to G7 infection, were identified. Knock-down of one of the identified DEGs, the eukaryotic translation initiation factor 5A (eIF5A), diminished the LSHR and enhanced viral accumulation, suggesting the essential role of eIF5A in the G7-induced, Rsv1-mediated LSHR signalling pathway. This work provides an in-depth genome-wide analysis of high-throughput sequencing data, and identifies multiple genes and microRNA signatures that are associated with the Rsv1-mediated LSHR. © 2016 HER MAJESTY THE QUEEN IN RIGHT OF CANADA MOLECULAR PLANT PATHOLOGY © 2016 BSPP AND JOHN WILEY & SONS LTD.
Genomic Epidemiology of Hypervirulent Serogroup W, ST-11 Neisseria meningitidis

PubMed Central

Mustapha, Mustapha M.; Marsh, Jane W.; Krauland, Mary G.; Fernandez, Jorge O.; de Lemos, Ana Paula S.; Dunning Hotopp, Julie C.; Wang, Xin; Mayer, Leonard W.; Lawrence, Jeffrey G.; Hiller, N. Luisa; Harrison, Lee H.

2015-01-01

Neisseria meningitidis is a leading bacterial cause of sepsis and meningitis globally with dynamic strain distribution over time. Beginning with an epidemic among Hajj pilgrims in 2000, serogroup W (W) sequence type (ST) 11 emerged as a leading cause of epidemic meningitis in the African ‘meningitis belt’ and endemic cases in South America, Europe, Middle East and China. Previous genotyping studies were unable to reliably discriminate sporadic W ST-11 strains in circulation since 1970 from the Hajj outbreak strain (Hajj clone). It is also unclear what proportion of more recent W ST-11 disease clusters are caused by direct descendants of the Hajj clone. Whole genome sequences of 270 meningococcal strains isolated from patients with invasive meningococcal disease globally from 1970 to 2013 were compared using whole genome phylogenetic and major antigen-encoding gene sequence analyses. We found that all W ST-11 strains were descendants of an ancestral strain that had undergone unique capsular switching events. The Hajj clone and its descendants were distinct from other W ST-11 strains in that they shared a common antigen gene profile and had undergone recombination involving virulence genes encoding factor H binding protein, nitric oxide reductase, and nitrite reductase. These data demonstrate that recent acquisition of a distinct antigen-encoding gene profile and variations in meningococcal virulence genes was associated with the emergence of the Hajj clone. Importantly, W ST-11 strains unrelated to the Hajj outbreak contribute a significant proportion of W ST-11 cases globally. This study helps illuminate genomic factors associated with meningococcal strain emergence and evolution. PMID:26629539
Comparison of the aggregation of homologous β2-microglobulin variants reveals protein solubility as a key determinant of amyloid formation.

PubMed

Pashley, Clare L; Hewitt, Eric W; Radford, Sheena E

2016-02-13

The mouse and human β2-microglobulin protein orthologs are 70% identical in sequence and share 88% sequence similarity. These proteins are predicted by various algorithms to have similar aggregation and amyloid propensities. However, whilst human β2m (hβ2m) forms amyloid-like fibrils in denaturing conditions (e.g. pH2.5) in the absence of NaCl, mouse β2m (mβ2m) requires the addition of 0.3M NaCl to cause fibrillation. Here, the factors which give rise to this difference in amyloid propensity are investigated. We utilise structural and mutational analyses, fibril growth kinetics and solubility measurements under a range of pH and salt conditions, to determine why these two proteins have different amyloid propensities. The results show that, although other factors influence the fibril growth kinetics, a striking difference in the solubility of the proteins is a key determinant of the different amyloidogenicity of hβ2m and mβ2m. The relationship between protein solubility and lag time of amyloid formation is not captured by current aggregation or amyloid prediction algorithms, indicating a need to better understand the role of solubility on the lag time of amyloid formation. The results demonstrate the key contribution of protein solubility in determining amyloid propensity and lag time of amyloid formation, highlighting how small differences in protein sequence can have dramatic effects on amyloid formation. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Time Correlations of Lightning Flash Sequences in Thunderstorms Revealed by Fractal Analysis

NASA Astrophysics Data System (ADS)

Gou, Xueqiang; Chen, Mingli; Zhang, Guangshu

2018-01-01

By using the data of lightning detection and ranging system at the Kennedy Space Center, the temporal fractal and correlation of interevent time series of lightning flash sequences in thunderstorms have been investigated with Allan factor (AF), Fano factor (FF), and detrended fluctuation analysis (DFA) methods. AF, FF, and DFA methods are powerful tools to detect the time-scaling structures and correlations in point processes. Totally 40 thunderstorms with distinguishing features of a single-cell storm and apparent increase and decrease in the total flash rate were selected for the analysis. It is found that the time-scaling exponents for AF (αAF) and FF (αFF) analyses are 1.62 and 0.95 in average, respectively, indicating a strong time correlation of the lightning flash sequences. DFA analysis shows that there is a crossover phenomenon—a crossover timescale (τc) ranging from 54 to 195 s with an average of 114 s. The occurrence of a lightning flash in a thunderstorm behaves randomly at timescales <τc but shows strong time correlation at scales >τc. Physically, these may imply that the establishment of an extensive strong electric field necessary for the occurrence of a lightning flash needs a timescale >τc, which behaves strongly time correlated. But the initiation of a lightning flash within a well-established extensive strong electric field may involve the heterogeneities of the electric field at a timescale <τc, which behave randomly.
Detecting differential DNA methylation from sequencing of bisulfite converted DNA of diverse species.

PubMed

Huh, Iksoo; Wu, Xin; Park, Taesung; Yi, Soojin V

2017-07-21

DNA methylation is one of the most extensively studied epigenetic modifications of genomic DNA. In recent years, sequencing of bisulfite-converted DNA, particularly via next-generation sequencing technologies, has become a widely popular method to study DNA methylation. This method can be readily applied to a variety of species, dramatically expanding the scope of DNA methylation studies beyond the traditionally studied human and mouse systems. In parallel to the increasing wealth of genomic methylation profiles, many statistical tools have been developed to detect differentially methylated loci (DMLs) or differentially methylated regions (DMRs) between biological conditions. We discuss and summarize several key properties of currently available tools to detect DMLs and DMRs from sequencing of bisulfite-converted DNA. However, the majority of the statistical tools developed for DML/DMR analyses have been validated using only mammalian data sets, and less priority has been placed on the analyses of invertebrate or plant DNA methylation data. We demonstrate that genomic methylation profiles of non-mammalian species are often highly distinct from those of mammalian species using examples of honey bees and humans. We then discuss how such differences in data properties may affect statistical analyses. Based on these differences, we provide three specific recommendations to improve the power and accuracy of DML and DMR analyses of invertebrate data when using currently available statistical tools. These considerations should facilitate systematic and robust analyses of DNA methylation from diverse species, thus advancing our understanding of DNA methylation. © The Author 2017. Published by Oxford University Press.
Phylogenetic Relationship of Necoclí Virus to Other South American Hantaviruses (Bunyaviridae: Hantavirus).

PubMed

Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F

2015-07-01

The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.

PubMed

Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E

2017-02-01

Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.
LINE-1 retrotransposons: from 'parasite' sequences to functional elements.

PubMed

Paço, Ana; Adega, Filomena; Chaves, Raquel

2015-02-01

Long interspersed nuclear elements-1 (LINE-1) are the most abundant and active retrotransposons in the mammalian genomes. Traditionally, the occurrence of LINE-1 sequences in the genome of mammals has been explained by the selfish DNA hypothesis. Nevertheless, recently, it has also been argued that these sequences could play important roles in these genomes, as in the regulation of gene expression, genome modelling and X-chromosome inactivation. The non-random chromosomal distribution is a striking feature of these retroelements that somehow reflects its functionality. In the present study, we have isolated and analysed a fraction of the open reading frame 2 (ORF2) LINE-1 sequence from three rodent species, Cricetus cricetus, Peromyscus eremicus and Praomys tullbergi. Physical mapping of the isolated sequences revealed an interspersed longitudinal AT pattern of distribution along all the chromosomes of the complement in the three genomes. A detailed analysis shows that these sequences are preferentially located in the euchromatic regions, although some signals could be detected in the heterochromatin. In addition, a coincidence between the location of imprinted gene regions (as Xist and Tsix gene regions) and the LINE-1 retroelements was also observed. According to these results, we propose an involvement of LINE-1 sequences in different genomic events as gene imprinting, X-chromosome inactivation and evolution of repetitive sequences located at the heterochromatic regions (e.g. satellite DNA sequences) of the rodents' genomes analysed.
Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

PubMed

Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

2009-06-01

The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
The WRKY Transcription Factor Genes in Eggplant (Solanum melongena L.) and Turkey Berry (Solanum torvum Sw.)

PubMed Central

Yang, Xu; Deng, Cao; Zhang, Yu; Cheng, Yufu; Huo, Qiuyue; Xue, Linbao

2015-01-01

WRKY transcription factors, which play critical roles in stress responses, have not been characterized in eggplant or its wild relative, turkey berry. The recent availability of RNA-sequencing data provides the opportunity to examine WRKY genes from a global perspective. We identified 50 and 62 WRKY genes in eggplant (SmelWRKYs) and turkey berry (StorWRKYs), respectively, all of which could be classified into three groups (I–III) based on the WRKY protein structure. The SmelWRKYs and StorWRKYs contain ~76% and ~95% of the number of WRKYs found in other sequenced asterid species, respectively. Positive selection analysis revealed that different selection constraints could have affected the evolution of these groups. Positively-selected sites were found in Groups IIc and III. Branch-specific selection pressure analysis indicated that most WRKY domains from SmelWRKYs and StorWRKYs are conserved and have evolved at low rates since their divergence. Comparison to homologous WRKY genes in Arabidopsis revealed several potential pathogen resistance-related SmelWRKYs and StorWRKYs, providing possible candidate genetic resources for improving stress tolerance in eggplant and probably other Solanaceae plants. To our knowledge, this is the first report of a genome-wide analyses of the SmelWRKYs and StorWRKYs. PMID:25853261
The WRKY transcription factor genes in eggplant (Solanum melongena L.) and Turkey Berry (Solanum torvum Sw.).

PubMed

Yang, Xu; Deng, Cao; Zhang, Yu; Cheng, Yufu; Huo, Qiuyue; Xue, Linbao

2015-04-07

WRKY transcription factors, which play critical roles in stress responses, have not been characterized in eggplant or its wild relative, turkey berry. The recent availability of RNA-sequencing data provides the opportunity to examine WRKY genes from a global perspective. We identified 50 and 62 WRKY genes in eggplant (SmelWRKYs) and turkey berry (StorWRKYs), respectively, all of which could be classified into three groups (I-III) based on the WRKY protein structure. The SmelWRKYs and StorWRKYs contain ~76% and ~95% of the number of WRKYs found in other sequenced asterid species, respectively. Positive selection analysis revealed that different selection constraints could have affected the evolution of these groups. Positively-selected sites were found in Groups IIc and III. Branch-specific selection pressure analysis indicated that most WRKY domains from SmelWRKYs and StorWRKYs are conserved and have evolved at low rates since their divergence. Comparison to homologous WRKY genes in Arabidopsis revealed several potential pathogen resistance-related SmelWRKYs and StorWRKYs, providing possible candidate genetic resources for improving stress tolerance in eggplant and probably other Solanaceae plants. To our knowledge, this is the first report of a genome-wide analyses of the SmelWRKYs and StorWRKYs.

Discovery of Transcription Factors Novel to Mouse Cerebellar Granule Cell Development Through Laser-Capture Microdissection.

PubMed

Zhang, Peter G Y; Yeung, Joanna; Gupta, Ishita; Ramirez, Miguel; Ha, Thomas; Swanson, Douglas J; Nagao-Sato, Sayaka; Itoh, Masayoshi; Kawaji, Hideya; Lassmann, Timo; Daub, Carsten O; Arner, Erik; de Hoon, Michiel; Carninci, Piero; Forrest, Alistair R R; Hayashizaki, Yoshihide; Goldowitz, Dan

2018-06-01

Laser-capture microdissection was used to isolate external germinal layer tissue from three developmental periods of mouse cerebellar development: embryonic days 13, 15, and 18. The cerebellar granule cell-enriched mRNA library was generated with next-generation sequencing using the Helicos technology. Our objective was to discover transcriptional regulators that could be important for the development of cerebellar granule cells-the most numerous neuron in the central nervous system. Through differential expression analysis, we have identified 82 differentially expressed transcription factors (TFs) from a total of 1311 differentially expressed genes. In addition, with TF-binding sequence analysis, we have identified 46 TF candidates that could be key regulators responsible for the variation in the granule cell transcriptome between developmental stages. Altogether, we identified 125 potential TFs (82 from differential expression analysis, 46 from motif analysis with 3 overlaps in the two sets). From this gene set, 37 TFs are considered novel due to the lack of previous knowledge about their roles in cerebellar development. The results from transcriptome-wide analyses were validated with existing online databases, qRT-PCR, and in situ hybridization. This study provides an initial insight into the TFs of cerebellar granule cells that might be important for development and provide valuable information for further functional studies on these transcriptional regulators.
Evolutionary mechanisms shaping the genetic population structure of coastal fish: insight from populations of Coilia nasus in Northwestern Pacific.

PubMed

Gao, Tianxiang; Wan, Zhenzhen; Song, Na; Zhang, Xiumei; Han, Zhiqiang

2014-12-01

A number of evolutionary mechanisms have been suggested for generating significant genetic structuring among marine fish populations in Northwestern Pacific. We used mtDNA control region to assess the factors in shaping the genetic structure of Japanese grenadier anchovy, Coilia nasus, an anadromous and estuarine coastal species, in Northwestern Pacific. Sixty seven individuals from four locations in Northwestern Pacific were sequenced for mitochondrial control region, detecting 61 haplotypes. The length of amplified control region varied from 677 to 754 bp. This length variability was due to the presence of varying numbers of a 38-bp tandemly repeated sequence. Two distinct lineages were detected, which might have diverged during Pleistocene low sea levels. There were strong differences in the geographical distribution of the two lineages. Analyses of molecular variance and the population statistic ΦST revealed significant genetic structure between China and Ariake Bay populations. Based on the frequency distribution of tandem repeat units, significant genetic differentiation was also detected between China and Ariake Bay populations. Isolation by distance seems to be the main factor driving present genetic structuring of C. nasus populations, indicating coastal dispersal pattern in this coastal species. Such an evolutionary process agrees well with some of the biological features characterizing this species.
Isolation and expression profiling of GhNAC transcription factor genes in cotton (Gossypium hirsutum L.) during leaf senescence and in response to stresses.

PubMed

Shah, Syed Tariq; Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Arain, Saima; Yu, Shuxun

2013-12-01

NAC (NAM, ATAF, and CUC) is a plant-specific transcription factor family with diverse roles in plant development and stress regulation. In this report, stress-responsive NAC genes (GhNAC8-GhNAC17) isolated from cotton (Gossypium hirsutum L.) were characterised in the context of leaf senescence and stress tolerance. The characterisation of NAC genes during leaf senescence has not yet been reported for cotton. Based on the sequence characterisation, these GhNACs could be classified into three groups belonging to three known NAC sub-families. Their predicted amino acid sequences exhibited similarities to NAC genes from other plant species. Senescent leaves were the sites of maximum expression for all GhNAC genes except GhNAC10 and GhNAC13, which showed maximum expression in fibres, collected from 25 days post anthesis (DPA) plants. The ten GhNAC genes displayed differential expression patterns and levels during natural and induced leaf senescence. Quantitative RT-PCR and promoter analyses suggest that these genes are induced by ABA, ethylene, drought, salinity, cold, heat, and other hormonal treatments. These results support a role for cotton GhNAC genes in transcriptional regulation of leaf senescence, stress tolerance and other developmental stages of cotton. © 2013.
The effects of cytosine methylation on general transcription factors

NASA Astrophysics Data System (ADS)

Jin, Jianshi; Lian, Tengfei; Gu, Chan; Yu, Kai; Gao, Yi Qin; Su, Xiao-Dong

2016-07-01

DNA methylation on CpG sites is the most common epigenetic modification. Recently, methylation in a non-CpG context was found to occur widely on genomic DNA. Moreover, methylation of non-CpG sites is a highly controlled process, and its level may vary during cellular development. To study non-CpG methylation effects on DNA/protein interactions, we have chosen three human transcription factors (TFs): glucocorticoid receptor (GR), brain and muscle ARNT-like 1 (BMAL1) - circadian locomotor output cycles kaput (CLOCK) and estrogen receptor (ER) with methylated or unmethylated DNA binding sequences, using single-molecule and isothermal titration calorimetry assays. The results demonstrated that these TFs interact with methylated DNA with different effects compared with their cognate DNA sequences. The effects of non-CpG methylation on transcriptional regulation were validated by cell-based luciferase assay at protein level. The mechanisms of non-CpG methylation influencing DNA-protein interactions were investigated by crystallographic analyses and molecular dynamics simulation. With BisChIP-seq assays in HEK-293T cells, we found that GR can recognize highly methylated sites within chromatin in cells. Therefore, we conclude that non-CpG methylation of DNA can provide a mechanism for regulating gene expression through directly affecting the binding of TFs.
P2RP: a Web-based framework for the identification and analysis of regulatory proteins in prokaryotic genomes.

PubMed

Barakat, Mohamed; Ortet, Philippe; Whitworth, David E

2013-04-20

Regulatory proteins (RPs) such as transcription factors (TFs) and two-component system (TCS) proteins control how prokaryotic cells respond to changes in their external and/or internal state. Identification and annotation of TFs and TCSs is non-trivial, and between-genome comparisons are often confounded by different standards in annotation. There is a need for user-friendly, fast and convenient tools to allow researchers to overcome the inherent variability in annotation between genome sequences. We have developed the web-server P2RP (Predicted Prokaryotic Regulatory Proteins), which enables users to identify and annotate TFs and TCS proteins within their sequences of interest. Users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or TCS domains. RPs identified in this manner are categorised into families, unambiguously annotated, and a detailed description of their features generated, using an integrated software pipeline. P2RP results can then be outputted in user-specified formats. Biologists have an increasing need for fast and intuitively usable tools, which is why P2RP has been developed as an interactive system. As well as assisting experimental biologists to interrogate novel sequence data, it is hoped that P2RP will be built into genome annotation pipelines and re-annotation processes, to increase the consistency of RP annotation in public genomic sequences. P2RP is the first publicly available tool for predicting and analysing RP proteins in users' sequences. The server is freely available and can be accessed along with documentation at http://www.p2rp.org.
Does order matter? Investigating the effect of sequence on glance duration during on-road driving

PubMed Central

Roberts, Shannon C.; Reimer, Bryan; Mehler, Bruce

2017-01-01

Previous literature has shown that vehicle crash risks increases as drivers’ off-road glance duration increases. Many factors influence drivers’ glance duration such as individual differences, driving environment, or task characteristics. Theories and past studies suggest that glance duration increases as the task progresses, but the exact relationship between glance sequence and glance durations is not fully understood. The purpose of this study was to examine the effect of glance sequence on glance duration among drivers completing a visual-manual radio tuning task and an auditory-vocal based multi-modal navigation entry task. Eighty participants drove a vehicle on urban highways while completing radio tuning and navigation entry tasks. Forty participants drove under an experimental protocol that required three button presses followed by rotation of a tuning knob to complete the radio tuning task while the other forty participants completed the task with one less button press. Multiple statistical analyses were conducted to measure the effect of glance sequence on glance duration. Results showed that across both tasks and a variety of statistical tests, glance sequence had inconsistent effects on glance duration—the effects varied according to the number of glances, task type, and data set that was being evaluated. Results suggest that other aspects of the task as well as interface design effect glance duration and should be considered in the context of examining driver attention or lack thereof. All in all, interface design and task characteristics have a more influential impact on glance duration than glance sequence, suggesting that classical design considerations impacting driver attention, such as the size and location of buttons, remain fundamental in designing in-vehicle interfaces. PMID:28158301
Expressed sequence tag (EST) analysis of two subspecies of Metarhizium anisopliae reveals a plethora of secreted proteins with potential activity in insect hosts.

PubMed

Freimoser, Florian M; Screen, Steven; Bagga, Savita; Hu, Gang; St Leger, Raymond J

2003-01-01

Expressed sequence tag (EST) libraries for Metarhizium anisopliae, the causative agent of green muscardine disease, were developed from the broad host-range pathogen Metarhizium anisopliae sf. anisopliae and the specific grasshopper pathogen, M. anisopliae sf. acridum. Approximately 1,700 5' end sequences from each subspecies were generated from cDNA libraries representing fungi grown under conditions that maximize secretion of cuticle-degrading enzymes. Both subspecies had ESTs for virtually all pathogenicity-related genes cloned to date from M. anisopliae, but many novel genes encoding potential virulence factors were also tagged. Enzymes with potential targets in the insect host included proteases, chitinases, phospholipases, lipases, esterases, phosphatases and enzymes producing toxic secondary metabolites. A diverse array of proteases composed 36 % of all M. anisopliae sf. anisopliae ESTs. Eighty percent of the ESTs that could be clustered into functional groups had significant matches (E<10(-5)) in other ascomycete fungi. These included genes reported to have specific roles in pathogens with plant or vertebrate hosts. Many of the remaining ESTs had their best BLAST match among animal, plant and bacterial sequences. These include genes with plant and microbial counterparts that produce potent antimicrobials. The abundance of transcripts discovered for different functional groups varied between the two subspecies of M. anisopliae in a manner consistent with ecological adaptations of the two pathogens. By hastening gene discovery this project has enhanced development of improved mycoinsecticides. In addition, the M. anisopliae ESTs represent a significant contribution to the extensive database of sequences from ascomycetes that are saprophytes or plant and vertebrate pathogens. Comparative analyses of these sequences is providing important information about the biology and evolutionary history of this clade.
Spatial Variability of Cyanobacteria and Heterotrophic Bacteria in Lake Taihu (China).

PubMed

Qian, Haifeng; Lu, Tao; Song, Hao; Lavoie, Michel; Xu, Jiahui; Fan, Xiaoji; Pan, Xiangliang

2017-09-01

Cyanobacterial blooms frequently occur in Lake Taihu (China), but the intertwined relationships between biotic and abiotic factors modulating the frequency and duration of the blooms remain enigmatic. To better understand the relationships between the key abiotic and biotic factors and cyanobacterial blooms, we measured the abundance and diversity of prokaryotic organisms by high-throughput sequencing, the abundance of key genes involved in microcystin production and nitrogen fixation or loss as well as several physicochemical parameters at several stations in Lake Taihu during a cyanobacterial bloom of Microcystis sp.. Measurements of the copy number of denitrification-related genes and 16S rRNA analyses show that denitrification potential and denitrifying bacteria abundance increased in concert with non-diazotrophic cyanobacteria (Microcystis sp.), suggesting limited competition between cyanobacteria and heterotrophic denitrifiers for nutrients, although potential bacteria-mediated N loss may hamper Microcystis growth. The present study provides insight into the importance of different abiotic and biotic factors in controlling cyanobacteria and heterotrophic bacteria spatial variability in Lake Taihu.
Multiple transcription factor codes activate epidermal wound–response genes in Drosophila

PubMed Central

Pearson, Joseph C.; Juarez, Michelle T.; Kim, Myungjin; Drivenes, Øyvind; McGinnis, William

2009-01-01

Wounds in Drosophila and mouse embryos induce similar genetic pathways to repair epidermal barriers. However, the transcription factors that transduce wound signals to repair epidermal barriers are largely unknown. We characterize the transcriptional regulatory enhancers of 4 genes—Ddc, ple, msn, and kkv—that are rapidly activated in epidermal cells surrounding wounds in late Drosophila embryos and early larvae. These epidermal wound enhancers all contain evolutionarily conserved sequences matching binding sites for JUN/FOS and GRH transcription factors, but vary widely in trans- and cis-requirements for these inputs and their binding sites. We propose that the combination of GRH and FOS is part of an ancient wound–response pathway still used in vertebrates and invertebrates, but that other mechanisms have evolved that result in similar transcriptional output. A common, but largely untested assumption of bioinformatic analyses of gene regulatory networks is that transcription units activated in the same spatial and temporal patterns will require the same cis-regulatory codes. Our results indicate that this is an overly simplistic view. PMID:19168633
Numerical aerodynamic simulation facility. [for flows about three-dimensional configurations

NASA Technical Reports Server (NTRS)

Bailey, F. R.; Hathaway, A. W.

1978-01-01

Critical to the advancement of computational aerodynamics capability is the ability to simulate flows about three-dimensional configurations that contain both compressible and viscous effects, including turbulence and flow separation at high Reynolds numbers. Analyses were conducted of two solution techniques for solving the Reynolds averaged Navier-Stokes equations describing the mean motion of a turbulent flow with certain terms involving the transport of turbulent momentum and energy modeled by auxiliary equations. The first solution technique is an implicit approximate factorization finite-difference scheme applied to three-dimensional flows that avoids the restrictive stability conditions when small grid spacing is used. The approximate factorization reduces the solution process to a sequence of three one-dimensional problems with easily inverted matrices. The second technique is a hybrid explicit/implicit finite-difference scheme which is also factored and applied to three-dimensional flows. Both methods are applicable to problems with highly distorted grids and a variety of boundary conditions and turbulence models.
Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

PubMed

Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

2016-09-29

Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.
External quality assessment for EGFR mutations in Italy: improvements in performances over the time.

PubMed

Normanno, Nicola; Fenizia, Francesca; Castiglione, Francesca; Barberis, Massimo; Taddei, Gian Luigi; Truini, Mauro; De Rosa, Gaetano; Pinto, Carmine; Marchetti, Antonio

2017-01-01

External quality assessment (EQA) schemes are essential procedures to assess the quality level of laboratories performing molecular testing of the epidermal growth factor receptor ( EGFR ) gene in non-small cell lung cancer. The Italian Association of Medical Oncology (AIOM) and the Italian Society of Pathology (SIAPEC-IAP) organise EGFR EQA programmes to ensure that the Italian laboratories achieve the quality standard levels required. Comparing the 2011, 2013 and 2015 EGFR EQA schemes, it was possible to observe improvements in the methodologies used and the outcomes. The use of direct sequencing was reduced from 78.7% in 2011 to only 14.1% in 2015, whereas the use of pyrosequencing and real-time PCR increased. The number of rounds in which centres using direct sequencing failed was significantly higher than the number of rounds that failed using other methods, both when analysing each single scheme and when combining the three EQAs together. In 2011 and 2013, about 29% of the participants failed the first phase of the programmes, compared with the 13% of centres failing in 2015, suggesting that the switch to more sensitive and robust methods could allow to increase the percentage of good performers. Although the molecular analyses are performed with good quality in Italy, the continuous education carried out by AIOM and SIAPEC-IAP remains a fundamental tool to maintain this quality level.
External quality assessment for EGFR mutations in Italy: improvements in performances over the time

PubMed Central

Normanno, Nicola; Fenizia, Francesca; Castiglione, Francesca; Barberis, Massimo; Taddei, Gian Luigi; Truini, Mauro; De Rosa, Gaetano; Pinto, Carmine; Marchetti, Antonio

2017-01-01

External quality assessment (EQA) schemes are essential procedures to assess the quality level of laboratories performing molecular testing of the epidermal growth factor receptor (EGFR) gene in non-small cell lung cancer. The Italian Association of Medical Oncology (AIOM) and the Italian Society of Pathology (SIAPEC-IAP) organise EGFR EQA programmes to ensure that the Italian laboratories achieve the quality standard levels required. Comparing the 2011, 2013 and 2015 EGFR EQA schemes, it was possible to observe improvements in the methodologies used and the outcomes. The use of direct sequencing was reduced from 78.7% in 2011 to only 14.1% in 2015, whereas the use of pyrosequencing and real-time PCR increased. The number of rounds in which centres using direct sequencing failed was significantly higher than the number of rounds that failed using other methods, both when analysing each single scheme and when combining the three EQAs together. In 2011 and 2013, about 29% of the participants failed the first phase of the programmes, compared with the 13% of centres failing in 2015, suggesting that the switch to more sensitive and robust methods could allow to increase the percentage of good performers. Although the molecular analyses are performed with good quality in Italy, the continuous education carried out by AIOM and SIAPEC-IAP remains a fundamental tool to maintain this quality level. PMID:29181190
A practical guide to environmental association analysis in landscape genomics.

PubMed

Rellstab, Christian; Gugerli, Felix; Eckert, Andrew J; Hancock, Angela M; Holderegger, Rolf

2015-09-01

Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next-generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced-representation sequencing, candidate-gene approaches, linearity of allele-environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies. © 2015 John Wiley & Sons Ltd.
Comparative genomics of the bacterial genus Streptococcus illuminates evolutionary implications of species groups.

PubMed

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into "species groups". However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups.
Risks Posed by Reston, the Forgotten Ebolavirus

PubMed Central

Cantoni, Diego; Hamlet, Arran; Michaelis, Martin; Wass, Mark N.

2016-01-01

ABSTRACT Out of the five members of the Ebolavirus family, four cause life-threatening disease, whereas the fifth, Reston virus (RESTV), is nonpathogenic in humans. The reasons for this discrepancy remain unclear. In this review, we analyze the currently available information to provide a state-of-the-art summary of the factors that determine the human pathogenicity of Ebolaviruses. RESTV causes sporadic infections in cynomolgus monkeys and is found in domestic pigs throughout the Philippines and China. Phylogenetic analyses revealed that RESTV is most closely related to the Sudan virus, which causes a high mortality rate in humans. Amino acid sequence differences between RESTV and the other Ebolaviruses are found in all nine Ebolavirus proteins, though no one residue appears sufficient to confer pathogenicity. Changes in the glycoprotein contribute to differences in Ebolavirus pathogenicity but are not sufficient to confer pathogenicity on their own. Similarly, differences in VP24 and VP35 affect viral immune evasion and are associated with changes in human pathogenicity. A recent in silico analysis systematically determined the functional consequences of sequence variations between RESTV and human-pathogenic Ebolaviruses. Multiple positions in VP24 were differently conserved between RESTV and the other Ebolaviruses and may alter human pathogenicity. In conclusion, the factors that determine the pathogenicity of Ebolaviruses in humans remain insufficiently understood. An improved understanding of these pathogenicity-determining factors is of crucial importance for disease prevention and for the early detection of emergent and potentially human-pathogenic RESTVs. PMID:28066813
Alteration of the SETBP1 gene and splicing pathway genes SF3B1, U2AF1, and SRSF2 in childhood acute myeloid leukemia.

PubMed

Choi, Hyun-Woo; Kim, Hye-Ran; Baek, Hee-Jo; Kook, Hoon; Cho, Duck; Shin, Jong-Hee; Suh, Soon-Pal; Ryang, Dong-Wook; Shin, Myung-Geun

2015-01-01

Recurrent somatic SET-binding protein 1 (SETBP1) and splicing pathway gene mutations have recently been found in atypical chronic myeloid leukemia and other hematologic malignancies. These mutations have been comprehensively analyzed in adult AML, but not in childhood AML. We investigated possible alteration of the SETBP1, splicing factor 3B subunit 1 (SF3B1), U2 small nuclear RNA auxiliary factor 1 (U2AF1), and serine/arginine-rich splicing factor 2 (SRSF2) genes in childhood AML. Cytogenetic and molecular analyses were performed to reveal chromosomal and genetic alterations. Sequence alterations in the SETBP1, SF3B1, U2AF1, and SRSF2 genes were examined by using direct sequencing in a cohort of 53 childhood AML patients. Childhood AML patients did not harbor any recurrent SETBP1 gene mutations, although our study did identify a synonymous mutation in one patient. None of the previously reported aberrations in the mutational hotspot of SF3B1, U2AF1, and SRSF2 were identified in any of the 53 patients. Alterations of the SETBP1 gene or SF3B1, U2AF1, and SRSF2 genes are not common genetic events in childhood AML, implying that the mutations are unlikely to exert a driver effect in myeloid leukemogenesis during childhood.
Comparative Genomics of the Bacterial Genus Streptococcus Illuminates Evolutionary Implications of Species Groups

PubMed Central

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into “species groups”. However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups. PMID:24977706
Complete genome sequencing and analysis of a Lancefield group G Streptococcus dysgalactiae subsp. equisimilis strain causing streptococcal toxic shock syndrome (STSS)

PubMed Central

2011-01-01

Background Streptococcus dysgalactiae subsp. equisimilis (SDSE) causes invasive streptococcal infections, including streptococcal toxic shock syndrome (STSS), as does Lancefield group A Streptococcus pyogenes (GAS). We sequenced the entire genome of SDSE strain GGS_124 isolated from a patient with STSS. Results We found that GGS_124 consisted of a circular genome of 2,106,340 bp. Comparative analyses among bacterial genomes indicated that GGS_124 was most closely related to GAS. GGS_124 and GAS, but not other streptococci, shared a number of virulence factor genes, including genes encoding streptolysin O, NADase, and streptokinase A, distantly related to SIC (DRS), suggesting the importance of these factors in the development of invasive disease. GGS_124 contained 3 prophages, with one containing a virulence factor gene for streptodornase. All 3 prophages were significantly similar to GAS prophages that carry virulence factor genes, indicating that these prophages had transferred these genes between pathogens. SDSE was found to contain a gene encoding a superantigen, streptococcal exotoxin type G, but lacked several genes present in GAS that encode virulence factors, such as other superantigens, cysteine protease speB, and hyaluronan synthase operon hasABC. Similar to GGS_124, the SDSE strains contained larger numbers of clustered, regularly interspaced, short palindromic repeats (CRISPR) spacers than did GAS, suggesting that horizontal gene transfer via streptococcal phages between SDSE and GAS is somewhat restricted, although they share phage species. Conclusion Genome wide comparisons of SDSE with GAS indicate that SDSE is closely and quantitatively related to GAS. SDSE, however, lacks several virulence factors of GAS, including superantigens, SPE-B and the hasABC operon. CRISPR spacers may limit the horizontal transfer of phage encoded GAS virulence genes into SDSE. These findings may provide clues for dissecting the pathological roles of the virulence factors in SDSE and GAS that cause STSS. PMID:21223537
Complete genome sequencing and analysis of a Lancefield group G Streptococcus dysgalactiae subsp. equisimilis strain causing streptococcal toxic shock syndrome (STSS).

PubMed

Shimomura, Yumi; Okumura, Kayo; Murayama, Somay Yamagata; Yagi, Junji; Ubukata, Kimiko; Kirikae, Teruo; Miyoshi-Akiyama, Tohru

2011-01-11

Streptococcus dysgalactiae subsp. equisimilis (SDSE) causes invasive streptococcal infections, including streptococcal toxic shock syndrome (STSS), as does Lancefield group A Streptococcus pyogenes (GAS). We sequenced the entire genome of SDSE strain GGS_124 isolated from a patient with STSS. We found that GGS_124 consisted of a circular genome of 2,106,340 bp. Comparative analyses among bacterial genomes indicated that GGS_124 was most closely related to GAS. GGS_124 and GAS, but not other streptococci, shared a number of virulence factor genes, including genes encoding streptolysin O, NADase, and streptokinase A, distantly related to SIC (DRS), suggesting the importance of these factors in the development of invasive disease. GGS_124 contained 3 prophages, with one containing a virulence factor gene for streptodornase. All 3 prophages were significantly similar to GAS prophages that carry virulence factor genes, indicating that these prophages had transferred these genes between pathogens. SDSE was found to contain a gene encoding a superantigen, streptococcal exotoxin type G, but lacked several genes present in GAS that encode virulence factors, such as other superantigens, cysteine protease speB, and hyaluronan synthase operon hasABC. Similar to GGS_124, the SDSE strains contained larger numbers of clustered, regularly interspaced, short palindromic repeats (CRISPR) spacers than did GAS, suggesting that horizontal gene transfer via streptococcal phages between SDSE and GAS is somewhat restricted, although they share phage species. Genome wide comparisons of SDSE with GAS indicate that SDSE is closely and quantitatively related to GAS. SDSE, however, lacks several virulence factors of GAS, including superantigens, SPE-B and the hasABC operon. CRISPR spacers may limit the horizontal transfer of phage encoded GAS virulence genes into SDSE. These findings may provide clues for dissecting the pathological roles of the virulence factors in SDSE and GAS that cause STSS.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.