bacterioplankton genomes inferred: Topics by Science.gov

Sample records for bacterioplankton genomes inferred

Diversity of bacterioplankton in coastal seawaters of Fildes Peninsula, King George Island, Antarctica.

PubMed

Zeng, Yin-Xin; Yu, Yong; Qiao, Zong-Yun; Jin, Hai-Yan; Li, Hui-Rong

2014-02-01

The bacterioplankton not only serves critical functions in marine nutrient cycles, but can also serve as indicators of the marine environment. The compositions of bacterial communities in the surface seawater of Ardley Cove and Great Wall Cove were analyzed using a 16S rRNA multiplex 454 pyrosequencing approach. Similar patterns of bacterial composition were found between the two coves, in which Bacteroidetes, Alphaproteobacteria, and Gammaproteobacteria were the dominant members of the bacterioplankton communities. In addition, a large fraction of the bacterial sequence reads (on average 5.3 % per station) could not be assigned below the domain level. Compared with Ardley Cove, Great Wall Cove showed higher chlorophyll and particulate organic carbon concentrations and exhibited relatively lower bacterial richness and diversity. Inferred metabolisms of summer bacterioplankton in the two coves were characterized by chemoheterotrophy and photoheterotrophy. Results suggest that some cosmopolitan species (e.g., Polaribacter and Sulfitobacter) belonging to a few bacterial groups that usually dominate in marine bacterioplankton communities may have similar ecological functions in similar marine environments but at different geographic locations.
Occurrence and expression of gene transfer agent genes in marine bacterioplankton.

PubMed

Biers, Erin J; Wang, Kui; Pennington, Catherine; Belas, Robert; Chen, Feng; Moran, Mary Ann

2008-05-01

Genes with homology to the transduction-like gene transfer agent (GTA) were observed in genome sequences of three cultured members of the marine Roseobacter clade. A broader search for homologs for this host-controlled virus-like gene transfer system identified likely GTA systems in cultured Alphaproteobacteria, and particularly in marine bacterioplankton representatives. Expression of GTA genes and extracellular release of GTA particles ( approximately 50 to 70 nm) was demonstrated experimentally for the Roseobacter clade member Silicibacter pomeroyi DSS-3, and intraspecific gene transfer was documented. GTA homologs are surprisingly infrequent in marine metagenomic sequence data, however, and the role of this lateral gene transfer mechanism in ocean bacterioplankton communities remains unclear.
Elevated pCO2 enhances bacterioplankton removal of organic carbon

PubMed Central

James, Anna K.; Passow, Uta; Brzezinski, Mark A.; Parsons, Rachel J.; Trapani, Jennifer N.; Carlson, Craig A.

2017-01-01

Factors that affect the removal of organic carbon by heterotrophic bacterioplankton can impact the rate and magnitude of organic carbon loss in the ocean through the conversion of a portion of consumed organic carbon to CO2. Through enhanced rates of consumption, surface bacterioplankton communities can also reduce the amount of dissolved organic carbon (DOC) available for export from the surface ocean. The present study investigated the direct effects of elevated pCO2 on bacterioplankton removal of several forms of DOC ranging from glucose to complex phytoplankton exudate and lysate, and naturally occurring DOC. Elevated pCO2 (1000–1500 ppm) enhanced both the rate and magnitude of organic carbon removal by bacterioplankton communities compared to low (pre-industrial and ambient) pCO2 (250 –~400 ppm). The increased removal was largely due to enhanced respiration, rather than enhanced production of bacterioplankton biomass. The results suggest that elevated pCO2 can increase DOC consumption and decrease bacterioplankton growth efficiency, ultimately decreasing the amount of DOC available for vertical export and increasing the production of CO2 in the surface ocean. PMID:28257422
Minimal-assumption inference from population-genomic data

NASA Astrophysics Data System (ADS)

Weissman, Daniel; Hallatschek, Oskar

Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
Bacterioplankton carbon cycling along the Subtropical Frontal Zone off New Zealand

NASA Astrophysics Data System (ADS)

Baltar, Federico; Stuck, Esther; Morales, Sergio; Currie, Kim

2015-06-01

Marine heterotrophic bacterioplankton (Bacteria and Archaea) play a central role in ocean carbon cycling. As such, identifying the factors controlling these microbial populations is crucial to fully understanding carbon fluxes. We studied bacterioplankton activities along a transect crossing three water masses (i.e., Subtropical waters [STW], Sub-Antarctic waters [SAW] and neritic waters [NW]) with contrasting nutrient regimes across the Subtropical Frontal Zone. In contrast to bacterioplankton production and community respiration, bacterioplankton respiration increased in the offshore SAW, causing a seaward increase in the contribution of bacteria to community respiration (from 7% to 100%). Cell-specific bacterioplankton respiration also increased in SAW, but cell-specific production did not, suggesting that prokaryotic cells in SAW were investing more energy towards respiration than growth. This was reflected in a 5-fold decline in bacterioplankton growth efficiency (BGE) towards SAW. One way to explain this decrease in BGE could be due to the observed reduction in phytoplankton biomass (and presumably organic matter concentration) towards SAW. However, this would not explain why bacterioplankton respiration was highest in SAW, where phytoplankton biomass was lowest. Another factor affecting BGE could be the iron limitation characteristic of high-nutrient low-chlorophyll (HNLC) regions like SAW. Our field-study based evidences would agree with previous laboratory experiments in which iron stress provoked a decrease in BGE of marine bacterial isolates. Our results suggest that there is a strong gradient in bacterioplankton carbon cycling rates along the Subtropical Frontal Zone, mainly due to the HNLC conditions of SAW. We suggest that Fe-induced reduction of BGE in HNLC regions like SAW could be relevant in marine carbon cycling, inducing bacterioplankton to act as a link or a sink of organic carbon by impacting on the quantity of organic carbon they incorporate
Linking Compositional and Functional Predictions to Decipher the Biogeochemical Significance in DFAA Turnover of Abundant Bacterioplankton Lineages in the North Sea.

PubMed

Wemheuer, Bernd; Wemheuer, Franziska; Meier, Dimitri; Billerbeck, Sara; Giebel, Helge-Ansgar; Simon, Meinhard; Scherber, Christoph; Daniel, Rolf

2017-11-05

Deciphering the ecological traits of abundant marine bacteria is a major challenge in marine microbial ecology. In the current study, we linked compositional and functional predictions to elucidate such traits for abundant bacterioplankton lineages in the North Sea. For this purpose, we investigated entire and active bacterioplankton composition along a transect ranging from the German Bight to the northern North Sea by pyrotag sequencing of bacterial 16S rRNA genes and transcripts. Functional profiles were inferred from 16S rRNA data using Tax4Fun. Bacterioplankton communities were dominated by well-known marine lineages including clusters/genera that are affiliated with the Roseobacter group and the Flavobacteria . Variations in community composition and function were significantly explained by measured environmental and microbial properties. Turnover of dissolved free amino acids (DFAA) showed the strongest correlation to community composition and function. We applied multinomial models, which enabled us to identify bacterial lineages involved in DFAA turnover. For instance, the genus Planktomarina was more abundant at higher DFAA turnover rates, suggesting its vital role in amino acid degradation. Functional predictions further indicated that Planktomarina is involved in leucine and isoleucine degradation. Overall, our results provide novel insights into the biogeochemical significance of abundant bacterioplankton lineages in the North Sea.
The green impact: bacterioplankton response toward a phytoplankton spring bloom in the southern North Sea assessed by comparative metagenomic and metatranscriptomic approaches

PubMed Central

Wemheuer, Bernd; Wemheuer, Franziska; Hollensteiner, Jacqueline; Meyer, Frauke-Dorothee; Voget, Sonja; Daniel, Rolf

2015-01-01

Phytoplankton blooms exhibit a severe impact on bacterioplankton communities as they change nutrient availabilities and other environmental factors. In the current study, the response of a bacterioplankton community to a Phaeocystis globosa spring bloom was investigated in the southern North Sea. For this purpose, water samples were taken inside and reference samples outside of an algal spring bloom. Structural changes of the bacterioplankton community were assessed by amplicon-based analysis of 16S rRNA genes and transcripts generated from environmental DNA and RNA, respectively. Several marine groups responded to bloom presence. The abundance of the Roseobacter RCA cluster and the SAR92 clade significantly increased in bloom presence in the total and active fraction of the bacterial community. Functional changes were investigated by direct sequencing of environmental DNA and mRNA. The corresponding datasets comprised more than 500 million sequences across all samples. Metatranscriptomic data sets were mapped on representative genomes of abundant marine groups present in the samples and on assembled metagenomic and metatranscriptomic datasets. Differences in gene expression profiles between non-bloom and bloom samples were recorded. The genome-wide gene expression level of Planktomarina temperata, an abundant member of the Roseobacter RCA cluster, was higher inside the bloom. Genes that were differently expressed included transposases, which showed increased expression levels inside the bloom. This might contribute to the adaptation of this organism toward environmental stresses through genome reorganization. In addition, several genes affiliated to the SAR92 clade were significantly upregulated inside the bloom including genes encoding for proteins involved in isoleucine and leucine incorporation. Obtained results provide novel insights into compositional and functional variations of marine bacterioplankton communities as response to a phytoplankton bloom. PMID
Bacterioplankton Populations within the Oxygen Minimum Zone of the Sargasso Sea

NASA Astrophysics Data System (ADS)

Schuler, G.; Parsons, R. J.; Johnson, R. J.

2016-02-01

Oxygen minimum zones are present throughout the world's oceans, and occur at depths between 200 to 1000m. Heterotrophic bacteria reduce the dissolved oxygen within this layer through respiration, while metabolizing falling particles. This report studied the bacterioplankton in the oxygen minimum zone at the BATS (Bermuda Atlantic Times-series Study) site from July 2014 until November 2014. Total bacterioplankton populations were enumerated through direct counts. In the transitional zone (400m-800m) of the oxygen minimum zone, a secondary bacterioplankton peak formed. This study used FISH (Fluorescent in situ hybridization) and CARD-FISH (Catalyzed Reporter Deposition-Fluorescent in situ hybridization) to enumerate specific bacterial and archaeal taxa. Crenarchaeota (including Thaumarchaeota) increased in abundance within the upper oxycline. Thaumarchaeota have the ammonia monooxygenase gene that oxidizes ammonium into nitrite in low oxygen conditions. Amplification of the amoA gene confirmed that ammonia oxidizing archaea (AOA) were present within the OMZ. Using Terminal Restriction Fragment Length Polymorphism (T-RFLP), the bacterial community structure showed high similarity based depth zones (0-80m, 160-600m, and 800-4500m). Niskin experiments determined that water collected at 800m had an exponential increase in bacterioplankton over time. While experimental design did not allow for oxygen levels to be maintained, the bacterioplankton community was predominantly bacteria with eubacteria positive cells making up 89.3% of the of the total bacterioplankton community by day 34. Improvements to the experimental design are required to determine which specific bacterial taxa caused this increase at 800m. This study suggests that there are factors other than oxygen influencing bacterioplankton populations at the BATS site, and more analysis is needed once the BATS data is available to determine the key drivers of bacterioplankton dynamics within the BATS OMZ.
AD-LIBS: inferring ancestry across hybrid genomes using low-coverage sequence data.

PubMed

Schaefer, Nathan K; Shapiro, Beth; Green, Richard E

2017-04-04

Inferring the ancestry of each region of admixed individuals' genomes is useful in studies ranging from disease gene mapping to speciation genetics. Current methods require high-coverage genotype data and phased reference panels, and are therefore inappropriate for many data sets. We present a software application, AD-LIBS, that uses a hidden Markov model to infer ancestry across hybrid genomes without requiring variant calling or phasing. This approach is useful for non-model organisms and in cases of low-coverage data, such as ancient DNA. We demonstrate the utility of AD-LIBS with synthetic data. We then use AD-LIBS to infer ancestry in two published data sets: European human genomes with Neanderthal ancestry and brown bear genomes with polar bear ancestry. AD-LIBS correctly infers 87-91% of ancestry in simulations and produces ancestry maps that agree with published results and global ancestry estimates in humans. In brown bears, we find more polar bear ancestry than has been published previously, using both AD-LIBS and an existing software application for local ancestry inference, HAPMIX. We validate AD-LIBS polar bear ancestry maps by recovering a geographic signal within bears that mirrors what is seen in SNP data. Finally, we demonstrate that AD-LIBS is more effective than HAPMIX at inferring ancestry when preexisting phased reference data are unavailable and genomes are sequenced to low coverage. AD-LIBS is an effective tool for ancestry inference that can be used even when few individuals are available for comparison or when genomes are sequenced to low coverage. AD-LIBS is therefore likely to be useful in studies of non-model or ancient organisms that lack large amounts of genomic DNA. AD-LIBS can therefore expand the range of studies in which admixture mapping is a viable tool.
Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

PubMed Central

Zhang, Yan-Cong; Lin, Kui

2015-01-01

Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828
Temporal patterns of phyto- and bacterioplankton and their relationships with environmental factors in Lake Taihu, China.

PubMed

Su, Xiaomei; Steinman, Alan D; Xue, Qingju; Zhao, Yanyan; Tang, Xiangming; Xie, Liqiang

2017-10-01

Phytoplankton and bacterioplankton are integral components of aquatic food webs and play essential roles in the structure and function of freshwater ecosystems. However, little is known about how phyto- and bacterioplankton may respond synchronously to changing environmental conditions. Thus, we analyzed simultaneously the composition and structure of phyto- and bacterioplankton on a monthly basis over 12 months in cyanobacteria-dominated areas of Lake Taihu and compared their responses to changes in environmental factors. Metric multi-dimensional scaling (mMDS) revealed that the temporal variations of phyto- and bacterioplankton were significant. Time lag analysis (TLA) indicated that the temporal pattern of phytoplankton tended to exhibit convergent dynamics while bacterioplankton showed highly stable or stochastic variation. A significant directional change was found for bacterioplankton at the genus level and the slopes (rate of change) and regression R 2 (low stochasticity or stability) were greater if Cyanobacteria were included, suggesting a higher level of instability in the bacterial community at lower taxonomy level. Consequently, phytoplankton responded more rapidly to the change in environmental conditions than bacterioplankton when analyzed at the phylum level, while bacterioplankton were more sensitive at the finer taxonomic resolution in Lake Taihu. Redundancy analysis (RDA) results showed that environmental variables collectively explained 51.0% variance of phytoplankton and 46.7% variance of bacterioplankton, suggesting that environmental conditions have a significant influence on the temporal variations of phyto- and bacterioplankton. Furthermore, variance partitioning indicated that the bacterial community structure was largely explained by water temperature and nitrogen, suggesting that these factors were the primary drivers shaping bacterioplankton. Copyright © 2017. Published by Elsevier Ltd.
Diazotrophic bacterioplankton in a coral reef lagoon: phylogeny, diel nitrogenase expression and response to phosphate enrichment.

PubMed

Hewson, Ian; Moisander, Pia H; Morrison, Amanda E; Zehr, Jonathan P

2007-05-01

We investigated diazotrophic bacterioplankton assemblage composition in the Heron Reef lagoon (Great Barrier Reef, Australia) using culture-independent techniques targeting the nifH fragment of the nitrogenase gene. Seawater was collected at 3 h intervals over a period of 72 h (i.e. over diel as well as tidal cycles). An incubation experiment was also conducted to assess the impact of phosphate (PO(4)3*) availability on nifH expression patterns. DNA-based nifH libraries contained primarily sequences that were most similar to nifH from sediment, microbial mat and surface-associated microorganisms, with a few sequences that clustered with typical open ocean phylotypes. In contrast to genomic DNA sequences, libraries prepared from gene transcripts (mRNA amplified by reverse transcription-polymerase chain reaction) were entirely cyanobacterial and contained phylotypes similar to those observed in open ocean plankton. The abundance of Trichodesmium and two uncultured cyanobacterial phylotypes from previous studies (group A and group B) were studied by quantitative-polymerase chain reaction in the lagoon samples. These were detected as transcripts, but were not detected in genomic DNA. The gene transcript abundance of these phylotypes demonstrated variability over several diel cycles. The PO(4)3* enrichment experiment had a clearer pattern of gene expression over diel cycles than the lagoon sampling, however PO(4)3* additions did not result in enhanced transcript abundance relative to control incubations. The results suggest that a number of diazotrophs in bacterioplankton of the reef lagoon may originate from sediment, coral or beachrock surfaces, sloughing into plankton with the flooding tide. The presence of typical open ocean phylotype transcripts in lagoon bacterioplankton may indicate that they are an important component of the N cycle of the coral reef.
Alignment-free genome tree inference by learning group-specific distance metrics.

PubMed

Patil, Kaustubh R; McHardy, Alice C

2013-01-01

Understanding the evolutionary relationships between organisms is vital for their in-depth study. Gene-based methods are often used to infer such relationships, which are not without drawbacks. One can now attempt to use genome-scale information, because of the ever increasing number of genomes available. This opportunity also presents a challenge in terms of computational efficiency. Two fundamentally different methods are often employed for sequence comparisons, namely alignment-based and alignment-free methods. Alignment-free methods rely on the genome signature concept and provide a computationally efficient way that is also applicable to nonhomologous sequences. The genome signature contains evolutionary signal as it is more similar for closely related organisms than for distantly related ones. We used genome-scale sequence information to infer taxonomic distances between organisms without additional information such as gene annotations. We propose a method to improve genome tree inference by learning specific distance metrics over the genome signature for groups of organisms with similar phylogenetic, genomic, or ecological properties. Specifically, our method learns a Mahalanobis metric for a set of genomes and a reference taxonomy to guide the learning process. By applying this method to more than a thousand prokaryotic genomes, we showed that, indeed, better distance metrics could be learned for most of the 18 groups of organisms tested here. Once a group-specific metric is available, it can be used to estimate the taxonomic distances for other sequenced organisms from the group. This study also presents a large scale comparison between 10 methods--9 alignment-free and 1 alignment-based.
Bacterioplankton: A Sink for Carbon in a Coastal Marine Plankton Community

NASA Astrophysics Data System (ADS)

Ducklow, Hugh W.; Purdie, Duncan A.; Leb. Williams, Peter J.; Davies, John M.

1986-05-01

Recent determinations of high production rates (up to 30 percent of primary production in surface waters) implicate free-living marine bacterioplankton as a link in a ``microbial loop'' that supplements phytoplankton as food for herbivores. An enclosed water column of 300 cubic meters was used to test the microbial loop hypothesis by following the fate of carbon-14--labeled bacterioplankton for over 50 days. Only 2 percent of the label initially fixed from carbon-14--labeled glucose by bacteria was present in larger organisms after 13 days, at which time about 20 percent of the total label added remained in the particulate fraction. Most of the label appeared to pass directly from particles smaller than 1 micrometer (heterotrophic bacterioplankton and some bacteriovores) to respired labeled carbon dioxide or to regenerated dissolved organic carbon-14. Secondary (and, by implication, primary) production by organisms smaller than 1 micrometer may not be an important food source in marine food chains. Bacterioplankton can be a sink for carbon in planktonic food webs and may serve principally as agents of nutrient regeneration rather than as food.
Siderophore production by bacterioplankton in enriched seawater incubations

NASA Astrophysics Data System (ADS)

Gledhill, M.; McCormack, P.; Worsfold, P. J.

2003-04-01

Iron is known to limit primary productivity in about 40 % of the worlds oceans. However the role of Fe in controlling bacterioplankton productivity is still a subject of debate, as carbon is also likely to be a significant limiting factor. Furthermore bacterioplankton are thought to have evolved a high affinity Fe transport mechanism utilising siderophores, which would enable acquisition even in the most Fe limited regions of the ocean. However, it is not yet certain if or how such a mechanism is employed in the oceans. Progress in this research area has been hindered by the lack of sufficiently sensitive analytical techniques for the determination of siderophores. We have recently developed a novel, highly sensitive technique for the detection of siderophore type compounds using electrospray ionisation - mass spectrometry (ESI-MS). Coupling of the technique with high performance liquid chromatography (HPLC) has allowed us to separate and identify siderophore type compounds present in complex mixtures at low concentrations (pM), thus allowing us to work with natural assemblages of bacteria in seawater. In this presentation we report on results obtained from incubations of natural bacterioplankton assemblages using coastal seawater from the English Channel. Known and unknown siderophores were identified in incubations carried out with additions of carbon, nitrogen and phosphorous. Iron speciation in the incubations was modified through the presence or absence of the chelating agent ethylenediamine-N,N-diacetic acid. Results show that different siderophores are produced under different conditions, probably a reflection of the type of bacterioplankton best able to exploit the incubation conditions. The results will be discussed with respect to their relevance to the marine environment.
Inferring genome-wide interplay landscape between DNA methylation and transcriptional regulation.

PubMed

Tang, Binhua; Wang, Xin

2015-01-01

DNA methylation and transcriptional regulation play important roles in cancer cell development and differentiation processes. Based on the currently available cell line profiling information from the ENCODE Consortium, we propose a Bayesian inference model to infer and construct genome-wide interaction landscape between DNA methylation and transcriptional regulation, which sheds light on the underlying complex functional mechanisms important within the human cancer and disease context. For the first time, we select all the currently available cell lines (>=20) and transcription factors (>=80) profiling information from the ENCODE Consortium portal. Through the integration of those genome-wide profiling sources, our genome-wide analysis detects multiple functional loci of interest, and indicates that DNA methylation is cell- and region-specific, due to the interplay mechanisms with transcription regulatory activities. We validate our analysis results with the corresponding RNA-sequencing technique for those detected genomic loci. Our results provide novel and meaningful insights for the interplay mechanisms of transcriptional regulation and gene expression for the human cancer and disease studies.
Drug target inference through pathway analysis of genomics data

PubMed Central

Ma, Haisu; Zhao, Hongyu

2013-01-01

Statistical modeling coupled with bioinformatics is commonly used for drug discovery. Although there exist many approaches for single target based drug design and target inference, recent years have seen a paradigm shift to system-level pharmacological research. Pathway analysis of genomics data represents one promising direction for computational inference of drug targets. This article aims at providing a comprehensive review on the evolving issues is this field, covering methodological developments, their pros and cons, as well as future research directions. PMID:23369829
How to infer relative fitness from a sample of genomic sequences.

PubMed

Dayarian, Adel; Shraiman, Boris I

2014-07-01

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
Fish-mediated changes in bacterioplankton community composition: an in situ mesocosm experiment

NASA Astrophysics Data System (ADS)

Luo, Congqiang; Yi, Chunlong; Ni, Leyi; Guo, Longgen

2017-06-01

We characterized variations in bacterioplankton community composition (BCC) in mesocosms subject to three different treatments. Two groups contained fish (group one: Cyprinus carpio; group two: Hypophthalmichthys molitrix); and group three, the untreated mesocosm, was the control. Samples were taken seven times over a 49-day period, and BCC was analyzed by PCR-denaturing gradient gel electrophoresis (DGGE) and real-time quantitative PCR (qPCR). Results revealed that introduction of C. carpio and H. molitrix had a remarkable impact on the composition of bacterioplankton communities, and the BCC was significantly different between each treatment. Sequencing of DGGE bands revealed that the bacterioplankton community in the different treatment groups was consistent at a taxonomic level, but differed in its abundance. H. molitrix promoted the richness of Alphaproteobacteria and Actinobacteria, while more bands affiliated to Cyanobacteria were detected inC. carpio mesocosms. The redundancy analysis (RDA) result demonstrated that the BCC was closely related to the bottom-up (total phosphorus, chlorophyll a, phytoplankton biomass) and top-down forces (biomass of copepods and cladocera) in C. carpio and control mesocosms, respectively. We found no evidence for top-down regulation of BCC by zooplankton in H. molitrix mesocosms, while grazing by protozoa (heterotrophic nanoflagellates, ciliates) became the major way to regulate BCC. Total bacterioplankton abundances were significantly higher in C. carpio mesocosms because of high nutrient concentration and suspended solids. Our study provided insights into the relationship between fish and bacterioplankton at species level, leading to a deep understanding of the function of the microbial loop and the aquatic ecosystem.
Fish-mediated changes in bacterioplankton community composition: an in situ mesocosm experiment

NASA Astrophysics Data System (ADS)

Luo, Congqiang; Yi, Chunlong; Ni, Leyi; Guo, Longgen

2018-03-01

We characterized variations in bacterioplankton community composition (BCC) in mesocosms subject to three different treatments. Two groups contained fish (group one: Cyprinus carpio; group two: Hypophthalmichthys molitrix); and group three, the untreated mesocosm, was the control. Samples were taken seven times over a 49-d period, and BCC was analyzed by PCR-denaturing gradient gel electrophoresis (DGGE) and real-time quantitative PCR (qPCR). Results revealed that introduction of C. carpio and H. molitrix had a remarkable impact on the composition of bacterioplankton communities, and the BCC was significantly different between each treatment. Sequencing of DGGE bands revealed that the bacterioplankton community in the different treatment groups was consistent at a taxonomic level, but differed in its abundance. H. molitrix promoted the richness of Alphaproteobacteria and Actinobacteria, while more bands affiliated to Cyanobacteria were detected in C. carpio mesocosms. The redundancy analysis (RDA) result demonstrated that the BCC was closely related to the bottom-up (total phosphorus, chlorophyll a, phytoplankton biomass) and top-down forces (biomass of copepods and cladocera) in C. carpio and control mesocosms, respectively. We found no evidence for top-down regulation of BCC by zooplankton in H. molitrix mesocosms, while grazing by protozoa (heterotrophic nanoflagellates, ciliates) became the major way to regulate BCC. Total bacterioplankton abundances were significantly higher in C. carpio mesocosms because of high nutrient concentration and suspended solids. Our study provided insights into the relationship between fish and bacterioplankton at species level, leading to a deep understanding of the function of the microbial loop and the aquatic ecosystem.

Co-occurrence Analysis of Microbial Taxa in the Atlantic Ocean Reveals High Connectivity in the Free-Living Bacterioplankton

PubMed Central

Milici, Mathias; Deng, Zhi-Luo; Tomasch, Jürgen; Decelle, Johan; Wos-Oxley, Melissa L.; Wang, Hui; Jáuregui, Ruy; Plumeier, Iris; Giebel, Helge-Ansgar; Badewien, Thomas H.; Wurst, Mascha; Pieper, Dietmar H.; Simon, Meinhard; Wagner-Döbler, Irene

2016-01-01

We determined the taxonomic composition of the bacterioplankton of the epipelagic zone of the Atlantic Ocean along a latitudinal transect (51°S–47°N) using Illumina sequencing of the V5-V6 region of the 16S rRNA gene and inferred co-occurrence networks. Bacterioplankon community composition was distinct for Longhurstian provinces and water depth. Free-living microbial communities (between 0.22 and 3 μm) were dominated by highly abundant and ubiquitous taxa with streamlined genomes (e.g., SAR11, SAR86, OM1, Prochlorococcus) and could clearly be separated from particle-associated communities which were dominated by Bacteroidetes, Planktomycetes, Verrucomicrobia, and Roseobacters. From a total of 369 different communities we then inferred co-occurrence networks for each size fraction and depth layer of the plankton between bacteria and between bacteria and phototrophic micro-eukaryotes. The inferred networks showed a reduction of edges in the deepest layer of the photic zone. Networks comprised of free-living bacteria had a larger amount of connections per OTU when compared to the particle associated communities throughout the water column. Negative correlations accounted for roughly one third of the total edges in the free-living communities at all depths, while they decreased with depth in the particle associated communities where they amounted for roughly 10% of the total in the last part of the epipelagic zone. Co-occurrence networks of bacteria with phototrophic micro-eukaryotes were not taxon-specific, and dominated by mutual exclusion (~60%). The data show a high degree of specialization to micro-environments in the water column and highlight the importance of interdependencies particularly between free-living bacteria in the upper layers of the epipelagic zone. PMID:27199970
Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

PubMed

Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

2012-10-05

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

PubMed Central

Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

2012-01-01

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495
An association network analysis among microeukaryotes and bacterioplankton reveals algal bloom dynamics.

PubMed

Tan, Shangjin; Zhou, Jin; Zhu, Xiaoshan; Yu, Shichen; Zhan, Wugen; Wang, Bo; Cai, Zhonghua

2015-02-01

Algal blooms are a worldwide phenomenon and the biological interactions that underlie their regulation are only just beginning to be understood. It is established that algal microorganisms associate with many other ubiquitous, oceanic organisms, but the interactions that lead to the dynamics of bloom formation are currently unknown. To address this gap, we used network approaches to investigate the association patterns among microeukaryotes and bacterioplankton in response to a natural Scrippsiella trochoidea bloom. This is the first study to apply network approaches to bloom dynamics. To this end, terminal restriction fragment (T-RF) length polymorphism analysis showed dramatic changes in community compositions of microeukaryotes and bacterioplankton over the blooming period. A variance ratio test revealed significant positive overall associations both within and between microeukaryotic and bacterioplankton communities. An association network generated from significant correlations between T-RFs revealed that S. trochoidea had few connections to other microeukaryotes and bacterioplankton and was placed on the edge. This lack of connectivity allowed for the S. trochoidea sub-network to break off from the overall network. These results allowed us to propose a conceptual model for explaining how changes in microbial associations regulate the dynamics of an algal bloom. In addition, key T-RFs were screened by principal components analysis, correlation coefficients, and network analysis. Dominant T-RFs were then identified through 18S and 16S rRNA gene clone libraries. Results showed that microeukaryotes clustered predominantly with Dinophyceae and Perkinsea while the majority of bacterioplankton identified were Alphaproteobacteria, Gammaproteobacteria, and Bacteroidetes. The ecologi-cal roles of both were discussed in the context of these findings. © 2014 Phycological Society of America.
Higher-level phylogeny of paraneopteran insects inferred from mitochondrial genome sequences

PubMed Central

Li, Hu; Shao, Renfu; Song, Nan; Song, Fan; Jiang, Pei; Li, Zhihong; Cai, Wanzhi

2015-01-01

Mitochondrial (mt) genome data have been proven to be informative for animal phylogenetic studies but may also suffer from systematic errors, due to the effects of accelerated substitution rate and compositional heterogeneity. We analyzed the mt genomes of 25 insect species from the four paraneopteran orders, aiming to better understand how accelerated substitution rate and compositional heterogeneity affect the inferences of the higher-level phylogeny of this diverse group of hemimetabolous insects. We found substantial heterogeneity in base composition and contrasting rates in nucleotide substitution among these paraneopteran insects, which complicate the inference of higher-level phylogeny. The phylogenies inferred with concatenated sequences of mt genes using maximum likelihood and Bayesian methods and homogeneous models failed to recover Psocodea and Hemiptera as monophyletic groups but grouped, instead, the taxa that had accelerated substitution rates together, including Sternorrhyncha (a suborder of Hemiptera), Thysanoptera, Phthiraptera and Liposcelididae (a family of Psocoptera). Bayesian inference with nucleotide sequences and heterogeneous models (CAT and CAT + GTR), however, recovered Psocodea, Thysanoptera and Hemiptera each as a monophyletic group. Within Psocodea, Liposcelididae is more closely related to Phthiraptera than to other species of Psocoptera. Furthermore, Thysanoptera was recovered as the sister group to Hemiptera. PMID:25704094
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance

DOE PAGES

Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle

2014-09-29

Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data

PubMed Central

Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M.; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A.; Gilks, C. Blake; Huntsman, David G.; McAlpine, Jessica N.; Aparicio, Samuel

2014-01-01

The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. PMID:25060187
Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

PubMed

Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

2016-09-19

Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.
Bacterioplankton Community Dynamics and Nutrient Availability in a Shallow Well Mixed Estuary of the Northern Gulf of Mexico.

NASA Astrophysics Data System (ADS)

Hoch, M. P.

2016-02-01

Sabine Lake Estuary is a shallow, well mixed, tidal lagoon of the Northern Gulf of Mexico. This study defines the bacterioplankton community composition and factors that may influence its variation in Sabine Lake Estuary. Twenty physicochemical parameters, phytoplankton photopigments, and bacterial 16SrDNA sequences were analyzed seasonally from twelve sites ranging from the inflows of Sabine and Neches Rivers to the Sabine Pass outflow. Photopigments were used to estimate phytoplankton groups via CHEMTAX, and bacterioplankton 16SrDNA sequences of 97% similarity were quantified and taxa identified. Nutrient availability experiments were conducted on bacterioplankton. Notable seasonal differences were seen in six of the ten most common (>3% of total sequences) classes of bacterioplankton. Canonical correspondence analysis (CCA) of common classes was used to explore physiochemical parameters and phytoplankton groups influencing variation in the bacterioplankton. Alphaproteobacteria were most abundant throughout the year. Opitutae, Actinobacteria, Sphingobacteria, and Beta-proteobacteria were strongly influenced by conditions with higher TDN, DOC, turbidity, and Chlorophytes during winter when high river discharges reduced salinity. Planctomycetacia were most prevalent during spring and coincide with predominance of Cryptophytes. In summer and fall the aforementioned classes decline, and there is an increase in Synechococcophycideae. Nitrogen was least available to bacterioplankton during summer and fall. Clearer, warmer and more saline conditions with lower DOC reflect tidal movement of seawater into the estuary when river discharges were low, conditions favorable for Synechococcophycidea. Seasonal fluctuations in physicochemical conditions and certain phytoplankton groups influence the variation in the bacterioplankton community in Sabine Lake Estuary.
Effects of nutrients on specific growth rate of bacterioplankton in oligotrophic lake water cultures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coveney, M.F.; Wetzel, R.G.

The effects of organic and inorganic nutrient additions on the specific growth rates of bacterioplankton in oligotrophic lake water cultures were investigated. Lake water was first passed through 0.8-{mu}m-pore-size filters (prescreening) to remove bacterivores and to minimize confounding effects of algae. Specific growth rates were calculated from changes in both bacterial cell numbers and biovolumes over 36 h. Gross specific growth rates in unmanipulated control samples were estimated through separate measurements of grazing losses by use of penicillin. The addition of mixed organic substrates alone to prescreened water did not significantly increase bacterioplankton specific growth rates. The addition of inorganicmore » phosphorus alone significantly increased one or both specific growth rates in three of four experiments, and one experiment showed a secondary stimulation by organic substrates. The stimulatory effects of phosphorus addition were greatest concurrently with the highest alkaline phosphatase activity in the lake water. Because bacteria have been shown to dominate inorganic phosphorus uptake in other P-deficient systems, the demonstration that phosphorus, rather than organic carbon, can limit bacterioplankton growth suggests direct competition between phytoplankton and bacterioplankton for inorganic phosphorus.« less
Interactive network configuration maintains bacterioplankton community structure under elevated CO2 in a eutrophic coastal mesocosm experiment

NASA Astrophysics Data System (ADS)

Lin, Xin; Huang, Ruiping; Li, Yan; Li, Futian; Wu, Yaping; Hutchins, David A.; Dai, Minhan; Gao, Kunshan

2018-01-01

There is increasing concern about the effects of ocean acidification on marine biogeochemical and ecological processes and the organisms that drive them, including marine bacteria. Here, we examine the effects of elevated CO2 on the bacterioplankton community during a mesocosm experiment using an artificial phytoplankton community in subtropical, eutrophic coastal waters of Xiamen, southern China. Through sequencing the bacterial 16S rRNA gene V3-V4 region, we found that the bacterioplankton community in this high-nutrient coastal environment was relatively resilient to changes in seawater carbonate chemistry. Based on comparative ecological network analysis, we found that elevated CO2 hardly altered the network structure of high-abundance bacterioplankton taxa but appeared to reassemble the community network of low abundance taxa. This led to relatively high resilience of the whole bacterioplankton community to the elevated CO2 level and associated chemical changes. We also observed that the Flavobacteria group, which plays an important role in the microbial carbon pump, showed higher relative abundance under the elevated CO2 condition during the early stage of the phytoplankton bloom in the mesocosms. Our results provide new insights into how elevated CO2 may influence bacterioplankton community structure.
TITAN: inference of copy number architectures in clonal cell populations from tumor whole-genome sequence data.

PubMed

Ha, Gavin; Roth, Andrew; Khattra, Jaswinder; Ho, Julie; Yap, Damian; Prentice, Leah M; Melnyk, Nataliya; McPherson, Andrew; Bashashati, Ali; Laks, Emma; Biele, Justina; Ding, Jiarui; Le, Alan; Rosner, Jamie; Shumansky, Karey; Marra, Marco A; Gilks, C Blake; Huntsman, David G; McAlpine, Jessica N; Aparicio, Samuel; Shah, Sohrab P

2014-11-01

The evolution of cancer genomes within a single tumor creates mixed cell populations with divergent somatic mutational landscapes. Inference of tumor subpopulations has been disproportionately focused on the assessment of somatic point mutations, whereas computational methods targeting evolutionary dynamics of copy number alterations (CNA) and loss of heterozygosity (LOH) in whole-genome sequencing data remain underdeveloped. We present a novel probabilistic model, TITAN, to infer CNA and LOH events while accounting for mixtures of cell populations, thereby estimating the proportion of cells harboring each event. We evaluate TITAN on idealized mixtures, simulating clonal populations from whole-genome sequences taken from genomically heterogeneous ovarian tumor sites collected from the same patient. In addition, we show in 23 whole genomes of breast tumors that the inference of CNA and LOH using TITAN critically informs population structure and the nature of the evolving cancer genome. Finally, we experimentally validated subclonal predictions using fluorescence in situ hybridization (FISH) and single-cell sequencing from an ovarian cancer patient sample, thereby recapitulating the key modeling assumptions of TITAN. © 2014 Ha et al.; Published by Cold Spring Harbor Laboratory Press.
Magnitude and regulation of bacterioplankton respiratory quotient across freshwater environmental gradients

PubMed Central

Berggren, Martin; Lapierre, Jean-François; del Giorgio, Paul A

2012-01-01

Bacterioplankton respiration (BR) may represent the largest single sink of organic carbon in the biosphere and constitutes an important driver of atmospheric carbon dioxide (CO2) emissions from freshwaters. Complete understanding of BR is precluded by the fact that most studies need to assume a respiratory quotient (RQ; mole of CO2 produced per mole of O2 consumed) to calculate rates of BR. Many studies have, without clear support, assumed a fixed RQ around 1. Here we present 72 direct measurements of bacterioplankton RQ that we carried out in epilimnetic samples of 52 freshwater sites in Québec (Canada), using O2 and CO2 optic sensors. The RQs tended to converge around 1.2, but showed large variability (s.d.=0.45) and significant correlations with major gradients of ecosystem-level, substrate-level and bacterial community-level characteristics. Experiments with natural bacterioplankton using different single substrates suggested that RQ is intimately linked to the elemental composition of the respired compounds. RQs were on average low in net autotrophic systems, where bacteria likely were utilizing mainly reduced substrates, whereas we found evidence that the dominance of highly oxidized substrates, for example, organic acids formed by photo-chemical processes, led to high RQ in the more heterotrophic systems. Further, we suggest that BR contributes to a substantially larger share of freshwater CO2 emissions than presently believed based on the assumption that RQ is ∼1. Our study demonstrates that bacterioplankton RQ is not only a practical aspect of BR determination, but also a major ecosystem state variable that provides unique information about aquatic ecosystem functioning. PMID:22094347
A statistical approach for inferring the 3D structure of the genome.

PubMed

Varoquaux, Nelle; Ay, Ferhat; Noble, William Stafford; Vert, Jean-Philippe

2014-06-15

Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA-DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on multidimensional scaling (MDS), in which the pairwise distances of the inferred model are optimized to resemble pairwise distances derived directly from the contact counts. These approaches, however, often optimize a heuristic objective function and require strong assumptions about the biophysics of DNA to transform interaction frequencies to spatial distance, and thereby may lead to incorrect structure reconstruction. We propose a novel approach to infer a consensus 3D structure of a genome from Hi-C data. The method incorporates a statistical model of the contact counts, assuming that the counts between two loci follow a Poisson distribution whose intensity decreases with the physical distances between the loci. The method can automatically adjust the transfer function relating the spatial distance to the Poisson intensity and infer a genome structure that best explains the observed data. We compare two variants of our Poisson method, with or without optimization of the transfer function, to four different MDS-based algorithms-two metric MDS methods using different stress functions, a non-metric version of MDS and ChromSDE, a recently described, advanced MDS method-on a wide range of simulated datasets. We demonstrate that the Poisson models reconstruct better structures than all MDS-based methods, particularly at low coverage and high resolution, and we highlight the importance of optimizing the transfer function. On publicly available Hi-C data from mouse embryonic stem cells, we show that the Poisson methods lead to more reproducible structures than MDS-based methods when we use data generated using different
Discordance Between Resident and Active Bacterioplankton in Free-Living and Particle-Associated Communities in Estuary Ecosystem.

PubMed

Li, Jia-Ling; Salam, Nimaichand; Wang, Pan-Deng; Chen, Lin-Xing; Jiao, Jian-Yu; Li, Xin; Xian, Wen-Dong; Han, Ming-Xian; Fang, Bao-Zhu; Mou, Xiao-Zhen; Li, Wen-Jun

2018-03-16

Bacterioplankton are the major driving force for biogeochemical cycles in estuarine ecosystems, but the communities that mediate these processes are largely unexplored. We sampled in the Pearl River Estuary (PRE) to examine potential differences in the taxonomic composition of resident (DNA-based) and active (RNA-based) bacterioplankton communities in free-living and particle-associated fractions. MiSeq sequencing data showed that the overall bacterial diversity in particle-associated fractions was higher than in free-living communities. Further in-depth analyses of the sequences revealed a positive correlation between resident and active bacterioplankton communities for the particle-associated fraction but not in the free-living fraction. However, a large overlapping of OTUs between free-living and particle-associated communities in PRE suggested that the two fractions may be actively exchanged. We also observed that the positive correlation between resident and active communities is more prominent among the abundant OTUs (relative abundance > 0.2%). Further, the results from the present study indicated that low-abundance bacterioplankton make an important contribution towards the metabolic activity in PRE.
Coral and macroalgal exudates vary in neutral sugar composition and differentially enrich reef bacterioplankton lineages

PubMed Central

Nelson, Craig E; Goldberg, Stuart J; Wegley Kelly, Linda; Haas, Andreas F; Smith, Jennifer E; Rohwer, Forest; Carlson, Craig A

2013-01-01

Increasing algal cover on tropical reefs worldwide may be maintained through feedbacks whereby algae outcompete coral by altering microbial activity. We hypothesized that algae and coral release compositionally distinct exudates that differentially alter bacterioplankton growth and community structure. We collected exudates from the dominant hermatypic coral holobiont Porites spp. and three dominant macroalgae (one each Ochrophyta, Rhodophyta and Chlorophyta) from reefs of Mo'orea, French Polynesia. We characterized exudates by measuring dissolved organic carbon (DOC) and fractional dissolved combined neutral sugars (DCNSs) and subsequently tracked bacterioplankton responses to each exudate over 48 h, assessing cellular growth, DOC/DCNS utilization and changes in taxonomic composition (via 16S rRNA amplicon pyrosequencing). Fleshy macroalgal exudates were enriched in the DCNS components fucose (Ochrophyta) and galactose (Rhodophyta); coral and calcareous algal exudates were enriched in total DCNS but in the same component proportions as ambient seawater. Rates of bacterioplankton growth and DOC utilization were significantly higher in algal exudate treatments than in coral exudate and control incubations with each community selectively removing different DCNS components. Coral exudates engendered the smallest shift in overall bacterioplankton community structure, maintained high diversity and enriched taxa from Alphaproteobacteria lineages containing cultured representatives with relatively few virulence factors (VFs) (Hyphomonadaceae and Erythrobacteraceae). In contrast, macroalgal exudates selected for less diverse communities heavily enriched in copiotrophic Gammaproteobacteria lineages containing cultured pathogens with increased VFs (Vibrionaceae and Pseudoalteromonadaceae). Our results demonstrate that algal exudates are enriched in DCNS components, foster rapid growth of bacterioplankton and select for bacterial populations with more potential VFs than
Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data

PubMed Central

McManus, Kimberly F.; Kelley, Joanna L.; Song, Shiya; Veeramah, Krishna R.; Woerner, August E.; Stevison, Laurie S.; Ryder, Oliver A.; Ape Genome Project, Great; Kidd, Jeffrey M.; Wall, Jeffrey D.; Bustamante, Carlos D.; Hammer, Michael F.

2015-01-01

Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. PMID:25534031
Phylogenetic comparisons of a coastal bacterioplankton community with its counterparts in open ocean and freshwater systems.

PubMed

Rappé; Vergin; Giovannoni

2000-09-01

In order to extend previous comparisons between coastal marine bacterioplankton communities and their open ocean and freshwater counterparts, here we summarize and provide new data on a clone library of 105 SSU rRNA genes recovered from seawater collected over the western continental shelf of the USA in the Pacific Ocean. Comparisons to previously published data revealed that this coastal bacterioplankton clone library was dominated by SSU rRNA gene phylotypes originally described from surface waters of the open ocean, but also revealed unique SSU rRNA gene lineages of beta Proteobacteria related to those found in clone libraries from freshwater habitats. beta Proteobacteria lineages common to coastal and freshwater samples included members of a clade of obligately methylotrophic bacteria, SSU rRNA genes affiliated with Xylophilus ampelinus, and a clade related to the genus Duganella. In addition, SSU rRNA genes were recovered from such previously recognized marine bacterioplankton SSU rRNA gene clone clusters as the SAR86, SAR11, and SAR116 clusters within the class Proteobacteria, the Roseobacter clade of the alpha subclass of the Proteobacteria, the marine group A/SAR406 cluster, and the marine Actinobacteria clade. Overall, these results support and extend previous observations concerning the global distribution of several marine planktonic prokaryote SSU rRNA gene phylotypes, but also show that coastal bacterioplankton communities contain SSU rRNA gene lineages (and presumably bacterioplankton) shown previously to be prevalent in freshwater habitats.
Interactions between hydrology and water chemistry shape bacterioplankton biogeography across boreal freshwater networks

PubMed Central

Niño-García, Juan Pablo; Ruiz-González, Clara; del Giorgio, Paul A

2016-01-01

Disentangling the mechanisms shaping bacterioplankton communities across freshwater ecosystems requires considering a hydrologic dimension that can influence both dispersal and local sorting, but how the environment and hydrology interact to shape the biogeography of freshwater bacterioplankton over large spatial scales remains unexplored. Using Illumina sequencing of the 16S ribosomal RNA gene, we investigate the large-scale spatial patterns of bacterioplankton across 386 freshwater systems from seven distinct regions in boreal Québec. We show that both hydrology and local water chemistry (mostly pH) interact to shape a sequential structuring of communities from highly diverse assemblages in headwater streams toward larger rivers and lakes dominated by fewer taxa. Increases in water residence time along the hydrologic continuum were accompanied by major losses of bacterial richness and by an increased differentiation of communities driven by local conditions (pH and other related variables). This suggests that hydrology and network position modulate the relative role of environmental sorting and mass effects on community assembly by determining both the time frame for bacterial growth and the composition of the immigrant pool. The apparent low dispersal limitation (that is, the lack of influence of geographic distance on the spatial patterns observed at the taxonomic resolution used) suggests that these boreal bacterioplankton communities derive from a shared bacterial pool that enters the networks through the smallest streams, largely dominated by mass effects, and that is increasingly subjected to local sorting of species during transit along the hydrologic continuum. PMID:26849312

Interactions between hydrology and water chemistry shape bacterioplankton biogeography across boreal freshwater networks.

PubMed

Niño-García, Juan Pablo; Ruiz-González, Clara; Del Giorgio, Paul A

2016-07-01

Disentangling the mechanisms shaping bacterioplankton communities across freshwater ecosystems requires considering a hydrologic dimension that can influence both dispersal and local sorting, but how the environment and hydrology interact to shape the biogeography of freshwater bacterioplankton over large spatial scales remains unexplored. Using Illumina sequencing of the 16S ribosomal RNA gene, we investigate the large-scale spatial patterns of bacterioplankton across 386 freshwater systems from seven distinct regions in boreal Québec. We show that both hydrology and local water chemistry (mostly pH) interact to shape a sequential structuring of communities from highly diverse assemblages in headwater streams toward larger rivers and lakes dominated by fewer taxa. Increases in water residence time along the hydrologic continuum were accompanied by major losses of bacterial richness and by an increased differentiation of communities driven by local conditions (pH and other related variables). This suggests that hydrology and network position modulate the relative role of environmental sorting and mass effects on community assembly by determining both the time frame for bacterial growth and the composition of the immigrant pool. The apparent low dispersal limitation (that is, the lack of influence of geographic distance on the spatial patterns observed at the taxonomic resolution used) suggests that these boreal bacterioplankton communities derive from a shared bacterial pool that enters the networks through the smallest streams, largely dominated by mass effects, and that is increasingly subjected to local sorting of species during transit along the hydrologic continuum.
Unusual bacterioplankton community structure in ultra-oligotrophic Crater Lake

USGS Publications Warehouse

Urbach, Ena; Vergin, Kevin L.; Morse, Ariel

2001-01-01

The bacterioplankton assemblage in Crater Lake, Oregon (U.S.A.), is different from communities found in other oxygenated lakes, as demonstrated by four small subunit ribosomal ribonucleic acid (SSU rRNA) gene clone libraries and oligonucleotide probe hybridization to RNA from lake water. Populations in the euphotic zone of this deep (589 m), oligotrophic caldera lake are dominated by two phylogenetic clusters of currently uncultivated bacteria: CL120-10, a newly identified cluster in the verrucomicrobiales, and ACK4 actinomycetes, known as a minor constituent of bacterioplankton in other lakes. Deep-water populations at 300 and 500 m are dominated by a different pair of uncultivated taxa: CL500-11, a novel cluster in the green nonsulfur bacteria, and group I marine crenarchaeota. b-Proteobacteria, dominant in most other freshwater environments, are relatively rare in Crater Lake (<=16% of nonchloroplast bacterial rRNA at all depths). Other taxa identified in Crater Lake libraries include a newly identified candidate bacterial division, ABY1, and a newly identified subcluster, CL0-1, within candidate division OP10. Probe analyses confirmed vertical stratification of several microbial groups, similar to patterns observed in open-ocean systems. Additional similarities between Crater Lake and ocean microbial populations include aphotic zone dominance of group I marine crenarchaeota and green nonsulfur bacteria. Comparison of Crater Lake to other lakes studied by rRNA methods suggests that selective factors structuring Crater Lake bacterioplankton populations may include low concentrations of available trace metals and dissolved organic matter, chemistry of infiltrating hydrothermal waters, and irradiation by high levels of ultraviolet light.
Inference of gorilla demographic and selective history from whole-genome sequence data.

PubMed

McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F

2015-03-01

Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
BACTERIOPLANKTON DYNAMICS IN A SUBTROPICAL ESTUARY: EVIDENCE FOR SUBSTRATE LIMITATION

EPA Science Inventory

Bacterioplankton abundance and metabolic characteristics were measured along a transect in Pensacola Bay, Florida, USA, to examine the factors that control microbial water column processes in this subtropical estuary. The microbial measures included 3 H-L-leucine incorporation, e...
Missing data imputation and haplotype phase inference for genome-wide association studies

PubMed Central

Browning, Sharon R.

2009-01-01

Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance. PMID:18850115
Response of Bacterioplankton Communities to Cadmium Exposure in Coastal Water Microcosms with High Temporal Variability

PubMed Central

Wang, Kai; Xiong, Jinbo; Chen, Xinxin; Zheng, Jialai; Hu, Changju; Yang, Yina; Zhu, Jianlin

2014-01-01

Multiple anthropogenic disturbances to bacterial diversity have been investigated in coastal ecosystems, in which temporal variability in the bacterioplankton community has been considered a ubiquitous process. However, far less is known about the temporal dynamics of a bacterioplankton community responding to pollution disturbances such as toxic metals. We used coastal water microcosms perturbed with 0, 10, 100, and 1,000 μg liter−1 of cadmium (Cd) for 2 weeks to investigate temporal variability, Cd-induced patterns, and their interaction in the coastal bacterioplankton community and to reveal whether the bacterial community structure would reflect the Cd gradient in a temporally varying system. Our results showed that the bacterioplankton community structure shifted along the Cd gradient consistently after a 4-day incubation, although it exhibited some resistance to Cd at low concentration (10 μg liter−1). A process akin to an arms race between temporal variability and Cd exposure was observed, and the temporal variability overwhelmed Cd-induced patterns in the bacterial community. The temporal succession of the bacterial community was correlated with pH, dissolved oxygen, NO3−-N, NO2−-N, PO43−-P, dissolved organic carbon, and chlorophyll a, and each of these parameters contributed more to community variance than Cd did. However, elevated Cd levels did decrease the temporal turnover rate of community. Furthermore, key taxa, affiliated to the families Flavobacteriaceae, Rhodobacteraceae, Erythrobacteraceae, Piscirickettsiaceae, and Alteromonadaceae, showed a high frequency of being associated with Cd levels during 2 weeks. This study provides direct evidence that specific Cd-induced patterns in bacterioplankton communities exist in highly varying manipulated coastal systems. Future investigations on an ecosystem scale across longer temporal scales are needed to validate the observed pattern. PMID:25326310
Covariance of bacterioplankton composition and environmental variables in a temperate delta system

USGS Publications Warehouse

Stepanauskas, R.; Moran, M.A.; Bergamaschi, B.A.; Hollibaugh, J.T.

2003-01-01

We examined seasonal and spatial variation in bacterioplankton composition in the Sacramento-San Joaquin River Delta (CA) using terminal restriction fragment length polymorphism (T-RFLP) analysis. Cloned 16S rRNA genes from this system were used for putative identification of taxa dominating the T-RFLP profiles. Both cloning and T-RFLP analysis indicated that Actinobacteria, Verrucomicrobia, Cytophaga-Flavobacterium and Proteobacteria were the most abundant bacterioplankton groups in the Delta. Despite the broad variety of sampled habitats (deep water channels, lakes, marshes, agricultural drains, freshwater and brackish areas), and the spatial and temporal differences in hydrology, temperature and water chemistry among the sampling campaigns, T-RFLP electropherograms from all samples were similar, indicating that the same bacterioplankton phylotypes dominated in the various habitats of the Delta throughout the year. However, principal component analysis (PCA) and partial least-squares regression (PLS) of T-RFLP profiles revealed consistent grouping of samples on a seasonal, but not a spatial, basis. ??-Proteobacteria related to Ralstonia, Actinobacteria related to Microthrix, and ??-Proteobacteria identical to the environmental Clone LD12 had the highest relative abundance in summer/fall T-RFLP profiles and were associated with low river flow, high pH, and a number of optical and chemical characteristics of dissolved organic carbon (DOC) indicative of an increased proportion of phytoplankton-produced organic material as opposed to allochthonous, terrestrially derived organic material. On the other hand, Geobacter-related ??-Proteobacteria showed a relative increase in abundance in T-RFLP analysis during winter/spring, and probably were washed out from watershed soils or sediment. Various phylotypes associated with the same phylogenetic division, based on tentative identification of T-RFLP fragments, exhibited diverse seasonal patterns, suggesting that ecological
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

PubMed Central

Cong, Yingnan; Chan, Yao-ban; Phillips, Charles A.; Langston, Michael A.; Ragan, Mark A.

2017-01-01

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. PMID:28154557
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF.

PubMed

Cong, Yingnan; Chan, Yao-Ban; Phillips, Charles A; Langston, Michael A; Ragan, Mark A

2017-01-01

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k ) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k . Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k .
Coastal Bacterioplankton Community Dynamics in Response to a Natural Disturbance

PubMed Central

Rappé, Michael S.

2013-01-01

In order to characterize how disturbances to microbial communities are propagated over temporal and spatial scales in aquatic environments, the dynamics of bacterial assemblages throughout a subtropical coastal embayment were investigated via SSU rRNA gene analyses over an 8-month period, which encompassed a large storm event. During non-perturbed conditions, sampling sites clustered into three groups based on their microbial community composition: an offshore oceanic group, a freshwater group, and a distinct and persistent coastal group. Significant differences in measured environmental parameters or in the bacterial community due to the storm event were found only within the coastal cluster of sampling sites, and only at 5 of 12 locations; three of these sites showed a significant response in both environmental and bacterial community characteristics. These responses were most pronounced at sites close to the shoreline. During the storm event, otherwise common bacterioplankton community members such as marine Synechococcus sp. and members of the SAR11 clade of Alphaproteobacteria decreased in relative abundance in the affected coastal zone, whereas several lineages of Gammaproteobacteria, Betaproteobacteria, and members of the Roseobacter clade of Alphaproteobacteria increased. The complex spatial patterns in both environmental conditions and microbial community structure related to freshwater runoff and wind convection during the perturbation event leads us to conclude that spatial heterogeneity was an important factor influencing both the dynamics and the resistance of the bacterioplankton communities to disturbances throughout this complex subtropical coastal system. This heterogeneity may play a role in facilitating a rapid rebound of regions harboring distinctly coastal bacterioplankton communities to their pre-disturbed taxonomic composition. PMID:23409156
Snowmelt-driven changes in dissolved organic matter and bacterioplankton communities in the Heilongjiang watershed of China.

PubMed

Qiu, Linlin; Cui, Hongyang; Wu, Junqiu; Wang, Baijie; Zhao, Yue; Li, Jiming; Jia, Liming; Wei, Zimin

2016-06-15

Bacterioplankton plays a significant role in the circulation of materials and ecosystem function in the biosphere. Dissolved organic matter (DOM) from dead plant material and surface soil leaches into water bodies when snow melts. In our study, water samples from nine sampling sites along the Heilongjiang watershed were collected in February and June 2014 during which period snowmelt occurred. The goal of this study was to characterize changes in DOM and bacterioplankton community composition (BCC) associated with snowmelt, the effects of DOM, environmental and geographical factors on the distribution of BCC and interactions of aquatic bacterioplankton populations with different sources of DOM in the Heilongjiang watershed. BCC was measured by denaturing gradient gel electrophoresis (DGGE). DOM was measured by excitation-emission matrix (EEM) fluorescence spectroscopy. Bacterioplankton exhibited a distinct seasonal change in community composition due to snowmelt at all sampling points except for EG. Redundancy analysis (RDA) indicated that BCC was more closely related to DOM (Components 1 and 4, dissolved organic carbon, biochemical oxygen demand and chlorophyll a) and environmental factors (water temperature and nitrate nitrogen) than geographical factors. Furthermore, DOM had a greater impact on BCC than environmental factors (29.80 vs. 15.90% of the variation). Overall, spring snowmelt played an important role in altering the quality and quantity of DOM and BCC in the Heilongjiang watershed. Copyright © 2016 Elsevier B.V. All rights reserved.
The aggregate site frequency spectrum for comparative population genomic inference.

PubMed

Xue, Alexander T; Hickerson, Michael J

2015-12-01

Understanding how assemblages of species responded to past climate change is a central goal of comparative phylogeography and comparative population genomics, an endeavour that has increasing potential to integrate with community ecology. New sequencing technology now provides the potential to perform complex demographic inference at unprecedented resolution across assemblages of nonmodel species. To this end, we introduce the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to use single nucleotide polymorphism (SNP) data sets collected from multiple, co-distributed species for assemblage-level demographic inference. We describe how the aSFS is constructed over an arbitrary number of independent population samples and then demonstrate how the aSFS can differentiate various multispecies demographic histories under a wide range of sampling configurations while allowing effective population sizes and expansion magnitudes to vary independently. We subsequently couple the aSFS with a hierarchical approximate Bayesian computation (hABC) framework to estimate degree of temporal synchronicity in expansion times across taxa, including an empirical demonstration with a data set consisting of five populations of the threespine stickleback (Gasterosteus aculeatus). Corroborating what is generally understood about the recent postglacial origins of these populations, the joint aSFS/hABC analysis strongly suggests that the stickleback data are most consistent with synchronous expansion after the Last Glacial Maximum (posterior probability = 0.99). The aSFS will have general application for multilevel statistical frameworks to test models involving assemblages and/or communities, and as large-scale SNP data from nonmodel species become routine, the aSFS expands the potential for powerful next-generation comparative population genomic inference. © 2015 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Inferring network structure in non-normal and mixed discrete-continuous genomic data

PubMed Central

Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

2017-01-01

Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. PMID:28437848
Habitat filtering of bacterioplankton communities above polymetallic nodule fields and sediments in the Clarion-Clipperton zone of the Pacific Ocean.

PubMed

Lindh, Markus V; Maillot, Brianne M; Smith, Craig R; Church, Matthew J

2018-04-01

Deep-sea mining of commercially valuable polymetallic nodule fields will generate a seabed sediment plume into the water column. Yet, the response of bacterioplankton communities, critical in regulating energy and matter fluxes in marine ecosystems, to such disturbances is unknown. Metacommunity theory, traditionally used in general ecology for macroorganisms, offers mechanistic understanding on the relative role of spatial differences compared with local environmental conditions (habitat filtering) for community assembly. We examined bacterioplankton metacommunities using 16S rRNA amplicons from the Clarion-Clipperton Zone (CCZ) in the eastern Pacific Ocean and in global ocean transect samples to determine sensitivity of these assemblages to environmental perturbations. Habitat filtering was the main assembly mechanism of bacterioplankton community composition in the epi- and mesopelagic waters of the CCZ and the Tara Oceans transect. Bathy- and abyssopelagic bacterioplankton assemblages were mainly assembled by undetermined metacommunity types or neutral and dispersal-driven patch-dynamics for the CCZ and the Malaspina transect. Environmental disturbances may alter the structure of upper-ocean microbial assemblages, with potentially even more substantial, yet unknown, impact on deep-sea communities. Predicting such responses in bacterioplankton assemblage dynamics can improve our understanding of microbially-mediated regulation of ecosystem services in the abyssal seabed likely to be exploited by future deep-sea mining operations. © 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.
Genomic inferences of domestication events are corroborated by written records in Brassica rapa.

PubMed

Qi, Xinshuai; An, Hong; Ragsdale, Aaron P; Hall, Tara E; Gutenkunst, Ryan N; Chris Pires, J; Barker, Michael S

2017-07-01

Demographic modelling is often used with population genomic data to infer the relationships and ages among populations. However, relatively few analyses are able to validate these inferences with independent data. Here, we leverage written records that describe distinct Brassica rapa crops to corroborate demographic models of domestication. Brassica rapa crops are renowned for their outstanding morphological diversity, but the relationships and order of domestication remain unclear. We generated genomewide SNPs from 126 accessions collected globally using high-throughput transcriptome data. Analyses of more than 31,000 SNPs across the B. rapa genome revealed evidence for five distinct genetic groups and supported a European-Central Asian origin of B. rapa crops. Our results supported the traditionally recognized South Asian and East Asian B. rapa groups with evidence that pak choi, Chinese cabbage and yellow sarson are likely monophyletic groups. In contrast, the oil-type B. rapa subsp. oleifera and brown sarson were polyphyletic. We also found no evidence to support the contention that rapini is the wild type or the earliest domesticated subspecies of B. rapa. Demographic analyses suggested that B. rapa was introduced to Asia 2,400-4,100 years ago, and that Chinese cabbage originated 1,200-2,100 years ago via admixture of pak choi and European-Central Asian B. rapa. We also inferred significantly different levels of founder effect among the B. rapa subspecies. Written records from antiquity that document these crops are consistent with these inferences. The concordance between our age estimates of domestication events with historical records provides unique support for our demographic inferences. © 2017 John Wiley & Sons Ltd.
Benchmarking Relatedness Inference Methods with Genome-Wide Data from Thousands of Relatives.

PubMed

Ramstetter, Monica D; Dyer, Thomas D; Lehman, Donna M; Curran, Joanne E; Duggirala, Ravindranath; Blangero, John; Mezey, Jason G; Williams, Amy L

2017-09-01

Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92-99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for >76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance. Copyright © 2017 Ramstetter et al.
Inference of gene regulatory networks from genome-wide knockout fitness data

PubMed Central

Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

2013-01-01

Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information
Distribution, Community Composition, and Potential Metabolic Activity of Bacterioplankton in an Urbanized Mediterranean Sea Coastal Zone

PubMed Central

Richa, Kumari; Balestra, Cecilia; Piredda, Roberta; Benes, Vladimir; Borra, Marco; Passarelli, Augusto; Margiotta, Francesca; Saggiomo, Maria; Biffali, Elio; Sanges, Remo; Scanlan, David J.

2017-01-01

ABSTRACT Bacterioplankton are fundamental components of marine ecosystems and influence the entire biosphere by contributing to the global biogeochemical cycles of key elements. Yet, there is a significant gap in knowledge about their diversity and specific activities, as well as environmental factors that shape their community composition and function. Here, the distribution and diversity of surface bacterioplankton along the coastline of the Gulf of Naples (GON; Italy) were investigated using flow cytometry coupled with high-throughput sequencing of the 16S rRNA gene. Heterotrophic bacteria numerically dominated the bacterioplankton and comprised mainly Alphaproteobacteria, Gammaproteobacteria, and Bacteroidetes. Distinct communities occupied river-influenced, coastal, and offshore sites, as indicated by Bray-Curtis dissimilarity, distance metric (UniFrac), linear discriminant analysis effect size (LEfSe), and multivariate analyses. The heterogeneity in diversity and community composition was mainly due to salinity and changes in environmental conditions across sites, as defined by nutrient and chlorophyll a concentrations. Bacterioplankton communities were composed of a few dominant taxa and a large proportion (92%) of rare taxa (here defined as operational taxonomic units [OTUs] accounting for <0.1% of the total sequence abundance), the majority of which were unique to each site. The relationship between 16S rRNA and the 16S rRNA gene, i.e., between potential metabolic activity and abundance, was positive for the whole community. However, analysis of individual OTUs revealed high rRNA-to-rRNA gene ratios for most (71.6% ± 16.7%) of the rare taxa, suggesting that these low-abundance organisms were potentially active and hence might be playing an important role in ecosystem diversity and functioning in the GON. IMPORTANCE The study of bacterioplankton in coastal zones is of critical importance, considering that these areas are highly productive and
Structuring of Bacterioplankton Diversity in a Large Tropical Bay

PubMed Central

Gregoracci, Gustavo B.; Nascimento, Juliana R.; Cabral, Anderson S.; Paranhos, Rodolfo; Valentin, Jean L.; Thompson, Cristiane C.; Thompson, Fabiano L.

2012-01-01

Structuring of bacterioplanktonic populations and factors that determine the structuring of specific niche partitions have been demonstrated only for a limited number of colder water environments. In order to better understand the physical chemical and biological parameters that may influence bacterioplankton diversity and abundance, we examined their productivity, abundance and diversity in the second largest Brazilian tropical bay (Guanabara Bay, GB), as well as seawater physical chemical and biological parameters of GB. The inner bay location with higher nutrient input favored higher microbial (including vibrio) growth. Metagenomic analysis revealed a predominance of Gammaproteobacteria in this location, while GB locations with lower nutrient concentration favored Alphaproteobacteria and Flavobacteria. According to the subsystems (SEED) functional analysis, GB has a distinctive metabolic signature, comprising a higher number of sequences in the metabolism of phosphorus and aromatic compounds and a lower number of sequences in the photosynthesis subsystem. The apparent phosphorus limitation appears to influence the GB metagenomic signature of the three locations. Phosphorus is also one of the main factors determining changes in the abundance of planktonic vibrios, suggesting that nutrient limitation can be observed at community (metagenomic) and population levels (total prokaryote and vibrio counts). PMID:22363639
Spatially uniform but temporally variable bacterioplankton in a semi-enclosed coastal area.

PubMed

Meziti, Alexandra; Kormas, Konstantinos A; Moustaka-Gouni, Maria; Karayanni, Hera

2015-07-01

Studies focusing on the temporal and spatial dynamics of bacterioplankton communities within littoral areas undergoing direct influences from the coast are quite limited. In addition, they are more complicated to resolve compared to communities in the open ocean. In order to elucidate the effects of spatial vs. temporal variability on bacterial communities in a highly land-influenced semi-enclosed gulf, surface bacterioplankton communities from five coastal sites in Igoumenitsa Gulf (Ionian Sea, Greece) were analyzed over a nine-month period using 16S rDNA 454-pyrosequencing. Temporal differences were more pronounced than spatial ones, with lower diversity indices observed during the summer months. During winter and early spring, bacterial communities were dominated by SAR11 representatives, while this pattern changed in May when they were abruptly replaced by members of Flavobacteriales, Pseudomonadales, and Alteromonadales. Additionally, correlation analysis showed high negative correlations between the presence of SAR11 OTUs in relation to temperature and sunlight that might have driven, directly or indirectly, the disappearance of these OTUs in the summer months. The dominance of SAR11 during the winter months further supported the global distribution of the clade, not only in the open-sea, but also in coastal systems. This study revealed that specific bacteria exhibited distinct succession patterns in an anthropogenic-impacted coastal system. The major bacterioplankton component was represented by commonly found marine bacteria exhibiting seasonal dynamics, while freshwater and terrestrial-related phylotypes were absent. Copyright © 2015 Elsevier GmbH. All rights reserved.

Non-random assembly of bacterioplankton communities in the subtropical north pacific ocean.

PubMed

Eiler, Alexander; Hayakawa, Darin H; Rappé, Michael S

2011-01-01

The exploration of bacterial diversity in the global ocean has revealed new taxa and previously unrecognized metabolic potential; however, our understanding of what regulates this diversity is limited. Using terminal restriction fragment length polymorphism (T-RFLP) data from bacterial small-subunit ribosomal RNA genes we show that, independent of depth and time, a large fraction of bacterioplankton co-occurrence patterns are non-random in the oligotrophic North Pacific subtropical gyre (NPSG). Pair-wise correlations of all identified operational taxonomic units (OTUs) revealed a high degree of significance, with 6.6% of the pair-wise co-occurrences being negatively correlated and 20.7% of them being positive. The most abundant OTUs, putatively identified as Prochlorococcus, SAR11, and SAR116 bacteria, were among the most correlated OTUs. As expected, bacterial community composition lacked statistically significant patterns of seasonality in the mostly stratified water column except in a few depth horizons of the sunlit surface waters, with higher frequency variations in community structure apparently related to populations associated with the deep chlorophyll maximum. Communities were structured vertically into epipelagic, mesopelagic, and bathypelagic populations. Permutation-based statistical analyses of T-RFLP data and their corresponding metadata revealed a broad range of putative environmental drivers controlling bacterioplankton community composition in the NPSG, including concentrations of inorganic nutrients and phytoplankton pigments. Together, our results suggest that deterministic forces such as environmental filtering and interactions among taxa determine bacterioplankton community patterns, and consequently affect ecosystem functions in the NPSG.
Metabolic Roles of Uncultivated Bacterioplankton Lineages in the Northern Gulf of Mexico “Dead Zone”

PubMed Central

Seitz, Kiley W.; Temperton, Ben; Gillies, Lauren E.; Rabalais, Nancy N.; Henrissat, Bernard; Mason, Olivia U.

2017-01-01

ABSTRACT Marine regions that have seasonal to long-term low dissolved oxygen (DO) concentrations, sometimes called “dead zones,” are increasing in number and severity around the globe with deleterious effects on ecology and economics. One of the largest of these coastal dead zones occurs on the continental shelf of the northern Gulf of Mexico (nGOM), which results from eutrophication-enhanced bacterioplankton respiration and strong seasonal stratification. Previous research in this dead zone revealed the presence of multiple cosmopolitan bacterioplankton lineages that have eluded cultivation, and thus their metabolic roles in this ecosystem remain unknown. We used a coupled shotgun metagenomic and metatranscriptomic approach to determine the metabolic potential of Marine Group II Euryarchaeota, SAR406, and SAR202. We recovered multiple high-quality, nearly complete genomes from all three groups as well as candidate phyla usually associated with anoxic environments—Parcubacteria (OD1) and Peregrinibacteria. Two additional groups with putative assignments to ACD39 and PAUC34f supplement the metabolic contributions by uncultivated taxa. Our results indicate active metabolism in all groups, including prevalent aerobic respiration, with concurrent expression of genes for nitrate reduction in SAR406 and SAR202, and dissimilatory nitrite reduction to ammonia and sulfur reduction by SAR406. We also report a variety of active heterotrophic carbon processing mechanisms, including degradation of complex carbohydrate compounds by SAR406, SAR202, ACD39, and PAUC34f. Together, these data help constrain the metabolic contributions from uncultivated groups in the nGOM during periods of low DO and suggest roles for these organisms in the breakdown of complex organic matter. PMID:28900024
Inferring causal genomic alterations in breast cancer using gene expression data

PubMed Central

2011-01-01

Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811
Inferring network structure in non-normal and mixed discrete-continuous genomic data.

PubMed

Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

2018-03-01

Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.
Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

PubMed Central

Denton, James F.; Lugo-Martinez, Jose; Tucker, Abraham E.; Schrider, Daniel R.; Warren, Wesley C.; Hahn, Matthew W.

2014-01-01

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. PMID:25474019
Extensive error in the number of genes inferred from draft genome assemblies.

PubMed

Denton, James F; Lugo-Martinez, Jose; Tucker, Abraham E; Schrider, Daniel R; Warren, Wesley C; Hahn, Matthew W

2014-12-01

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.
Bacterioplankton diversity and community composition in the Southern Lagoon of Venice.

PubMed

Simonato, Francesca; Gómez-Pereira, Paola R; Fuchs, Bernhard M; Amann, Rudolf

2010-04-01

The Lagoon of Venice is a large water basin that exchanges water with the Northern Adriatic Sea through three large inlets. In this study, the 16S rRNA approach was used to investigate the bacterial diversity and community composition within the southern basin of the Lagoon of Venice and at one inlet in October 2007 and June 2008. Comparative sequence analysis of 645 mostly partial 16S rRNA gene sequences indicated high diversity and dominance of Alphaproteobacteria, Gammaproteobacteria and Bacteroidetes at the lagoon as well as at the inlet station, therefore pointing to significant mixing. Many of these sequences were close to the 16S rRNA of marine, often coastal, bacterioplankton, such as the Roseobacter clade, the family Vibrionaceae, and class Flavobacteria. Sequences of Actinobacteria were indicators of a freshwater input. The composition of the bacterioplankton was quantified by catalyzed reporter deposition fluorescence in situ hybridization (CARD-FISH) with a set of rRNA-targeted oligonucleotide probes. CARD-FISH counts corroborated the dominance of members of the phyla Alphaproteobacteria, Gammaproteobacteria and Bacteroidetes. When assessed by a probe set for the quantification of selected clades within Alphaproteobacteria and Gammaproteobacteria, bacterioplankton composition differed between October 2007 and June 2008, and also between the inlet and the lagoon. In particular, members of the readily culturable copiotrophic gammaproteobacterial genera Vibrio, Alteromonas and Pseudoalteromonas were enriched in the southern basin of the Lagoon of Venice. Interestingly, the alphaproteobacterial SAR11 clade and related clusters were also present in high abundances at the inlet and within the lagoon, which was indicative of inflow of water from the open sea.
Sensitivity of bacterioplankton nitrogen metabolism to eutrophication in sub-tropical coastal waters of Key West, Florida.

PubMed

Hoch, Matthew P; Dillon, Kevin S; Coffin, Richard B; Cifuentes, Luis A

2008-05-01

Expression of intracellular ammonium assimilation enzymes were used to assess the response of nitrogen (N) metabolism in bacterioplankton to N-loading of sub-tropical coastal waters of Key West, Florida. Specific activities of glutamine synthetase (GS) and total glutamate dehydrogenase (GDHT) were measured on the bacterial size fraction (<0.8 microm) to assess N-deplete versus N-replete metabolic states, respectively. Enzyme results were compared to concentrations of dissolved organic matter and nutrients and to the biomass and production of phytoplankton and bacteria. Concentrations of dissolved inorganic N (DIN), dissolved organic N (DON), and dissolved organic carbon (DOC) positively correlated with specific activities of GDHT and negatively correlated with that of GS. Total dissolved N (TDN) concentration explained 81% of variance in bacterioplankton GDHT:GS activity ratio. The GDHT:GS ratio, TDN, DOC, and bacterial parameters decreased in magnitude along a tidally dynamic trophic gradient from north of Key West to south at the reef tract, which is consistent with the combined effects of localized coastal eutrophication and tidal exchange of seawater from the Southwest Florida Shelf and Florida Strait. The N-replete bacterioplankton north of Key West can regenerate ammonium which sustains primary production transported south to the reef. The range in GDHT:GS ratios was 5-30 times greater than that for commonly used indicators of planktonic eutrophication, which emphasizes the sensitivity of bacterioplankton N-metabolism to changes in N-bioavailability caused by nutrient pollution in sub-tropical coastal waters and utility of GDHT:GS ratio as an bioindicator of N-replete conditions.
Distribution, Community Composition, and Potential Metabolic Activity of Bacterioplankton in an Urbanized Mediterranean Sea Coastal Zone.

PubMed

Richa, Kumari; Balestra, Cecilia; Piredda, Roberta; Benes, Vladimir; Borra, Marco; Passarelli, Augusto; Margiotta, Francesca; Saggiomo, Maria; Biffali, Elio; Sanges, Remo; Scanlan, David J; Casotti, Raffaella

2017-09-01

Bacterioplankton are fundamental components of marine ecosystems and influence the entire biosphere by contributing to the global biogeochemical cycles of key elements. Yet, there is a significant gap in knowledge about their diversity and specific activities, as well as environmental factors that shape their community composition and function. Here, the distribution and diversity of surface bacterioplankton along the coastline of the Gulf of Naples (GON; Italy) were investigated using flow cytometry coupled with high-throughput sequencing of the 16S rRNA gene. Heterotrophic bacteria numerically dominated the bacterioplankton and comprised mainly Alphaproteobacteria , Gammaproteobacteria , and Bacteroidetes Distinct communities occupied river-influenced, coastal, and offshore sites, as indicated by Bray-Curtis dissimilarity, distance metric (UniFrac), linear discriminant analysis effect size (LEfSe), and multivariate analyses. The heterogeneity in diversity and community composition was mainly due to salinity and changes in environmental conditions across sites, as defined by nutrient and chlorophyll a concentrations. Bacterioplankton communities were composed of a few dominant taxa and a large proportion (92%) of rare taxa (here defined as operational taxonomic units [OTUs] accounting for <0.1% of the total sequence abundance), the majority of which were unique to each site. The relationship between 16S rRNA and the 16S rRNA gene, i.e., between potential metabolic activity and abundance, was positive for the whole community. However, analysis of individual OTUs revealed high rRNA-to-rRNA gene ratios for most (71.6% ± 16.7%) of the rare taxa, suggesting that these low-abundance organisms were potentially active and hence might be playing an important role in ecosystem diversity and functioning in the GON. IMPORTANCE The study of bacterioplankton in coastal zones is of critical importance, considering that these areas are highly productive and anthropogenically
Inference of population splits and mixtures from genome-wide allele frequency data.

PubMed

Pickrell, Joseph K; Pritchard, Jonathan K

2012-01-01

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.
Metabolic Roles of Uncultivated Bacterioplankton Lineages in the Northern Gulf of Mexico "Dead Zone".

PubMed

Thrash, J Cameron; Seitz, Kiley W; Baker, Brett J; Temperton, Ben; Gillies, Lauren E; Rabalais, Nancy N; Henrissat, Bernard; Mason, Olivia U

2017-09-12

Marine regions that have seasonal to long-term low dissolved oxygen (DO) concentrations, sometimes called "dead zones," are increasing in number and severity around the globe with deleterious effects on ecology and economics. One of the largest of these coastal dead zones occurs on the continental shelf of the northern Gulf of Mexico (nGOM), which results from eutrophication-enhanced bacterioplankton respiration and strong seasonal stratification. Previous research in this dead zone revealed the presence of multiple cosmopolitan bacterioplankton lineages that have eluded cultivation, and thus their metabolic roles in this ecosystem remain unknown. We used a coupled shotgun metagenomic and metatranscriptomic approach to determine the metabolic potential of Marine Group II Euryarchaeota , SAR406, and SAR202. We recovered multiple high-quality, nearly complete genomes from all three groups as well as candidate phyla usually associated with anoxic environments- Parcubacteria (OD1) and Peregrinibacteria Two additional groups with putative assignments to ACD39 and PAUC34f supplement the metabolic contributions by uncultivated taxa. Our results indicate active metabolism in all groups, including prevalent aerobic respiration, with concurrent expression of genes for nitrate reduction in SAR406 and SAR202, and dissimilatory nitrite reduction to ammonia and sulfur reduction by SAR406. We also report a variety of active heterotrophic carbon processing mechanisms, including degradation of complex carbohydrate compounds by SAR406, SAR202, ACD39, and PAUC34f. Together, these data help constrain the metabolic contributions from uncultivated groups in the nGOM during periods of low DO and suggest roles for these organisms in the breakdown of complex organic matter. IMPORTANCE Dead zones receive their name primarily from the reduction of eukaryotic macrobiota (demersal fish, shrimp, etc.) that are also key coastal fisheries. Excess nutrients contributed from anthropogenic activity
Streamlining and Large Ancestral Genomes in Archaea Inferred with a Phylogenetic Birth-and-Death Model

PubMed Central

Miklós, István

2009-01-01

Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire. PMID:19570746
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
Distinct Seasonal Patterns of Bacterioplankton Abundance and Dominance of Phyla α-Proteobacteria and Cyanobacteria in Qinhuangdao Coastal Waters Off the Bohai Sea

PubMed Central

He, Yaodong; Sen, Biswarup; Zhou, Shuangyan; Xie, Ningdong; Zhang, Yongfeng; Zhang, Jianle; Wang, Guangyi

2017-01-01

Qinhuangdao coastal waters in northern China are heavily impacted by anthropogenic and natural activities, and we anticipate a direct influence of the impact on the bacterioplankton abundance and diversity inhabiting the adjacent coastal areas. To ascertain the anthropogenic influences, we first evaluated the seasonal abundance patterns and diversity of bacterioplankton in the coastal areas with varied levels of natural and anthropogenic activities and then analyzed the environmental factors which influenced the abundance patterns. Results indicated distinct patterns in bacterioplankton abundance across the warm and cold seasons in all stations. Total bacterial abundance in the stations ranged from 8.67 × 104 to 2.08 × 106 cells/mL and had significant (p < 0.01) positive correlation with total phosphorus (TP), which indicated TP as the key monitoring parameter for anthropogenic impact on nutrients cycling. Proteobacteria and Cyanobacteria were the most abundant phyla in the Qinhuangdao coastal waters. Redundancy analysis revealed significant (p < 0.01) influence of temperature, dissolved oxygen and chlorophyll a on the spatiotemporal abundance pattern of α-Proteobacteria and Cyanobacteria groups. Among the 19 identified bacterioplankton subgroups, α-Proteobacteria (phylum Proteobacteria) was the dominant one followed by Family II (phylum Cyanobacteria), representing 19.1–55.2% and 2.3–54.2% of total sequences, respectively. An inverse relationship (r = -0.82) was observed between the two dominant subgroups, α-Proteobacteria and Family II. A wide range of inverse Simpson index (10.2 to 105) revealed spatial heterogeneity of bacterioplankton diversity likely resulting from the varied anthropogenic and natural influences. Overall, our results suggested that seasonal variations impose substantial influence on shaping bacterioplankton abundance patterns. In addition, the predominance of only a few cosmopolitan species in the Qinhuangdao coastal wasters was
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs.

PubMed

Dilthey, Alexander T; Gourraud, Pierre-Antoine; Mentzer, Alexander J; Cereb, Nezih; Iqbal, Zamin; McVean, Gil

2016-10-01

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30-250 CPU hours per sample) remain a significant
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

PubMed Central

Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

2016-01-01

Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant
Nearly a decade-long repeatable seasonal diversity patterns of bacterioplankton communities in the eutrophic Lake Donghu (Wuhan, China)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yan, Qingyun; Stegen, James C.; Yu, Yuhe

Uncovering which environmental factors have the greatest influence on community diversity patterns and how ecological processes govern community turnover are key questions related to understanding community assembly mechanisms. Although we have good understanding of plant and animal community assembly, the mechanisms regulating diversity patterns of aquatic bacterial communities in lake ecosystems remains poorly understood. Here we present nearly a decade-long time-series study of bacterioplankton communities from the eutrophic Lake Donghu (Wuhan, China) using 16S rRNA gene amplicon sequencing. We found strong repeatable seasonal patterns for the overall community, common (detected in more than 50% samples) and dominant bacterial taxa (relativemore » abundance > 1%). Moreover, community composition tracked the seasonal temperature gradient, indicating that temperature is an important environmental factor controlling observed diversity patterns. Total phosphorus also contributed significantly to the seasonal shifts in bacterioplankton composition. However, any spatial pattern across the main lake areas was overwhelmed by temporal variability in this eutrophic lake system. Phylogenetic analysis further indicated that 75%-82% of community turnover was governed by homogeneous selection, suggesting that the bacterioplankton communities are mainly controlled by niche-based processes. However, dominant niches available within seasons might be occupied by similar combinations of bacterial taxa with modest dispersal rates throughout this lake system. This study gives us important insights into community assembly and seasonal turnover of lake bacterioplankton, it may be also useful to predict temporal patterns of other planktonic communities.« less
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
INFLUENCE OF LIGHT ON BACTERIOPLANKTON PRODUCTION AND RESPIRATION IN A SUBTROPICAL CORAL REEF

EPA Science Inventory

The influence of sunlight on bacterioplankton production (14C-leucine (Leu) and 3H-thymidine (TdR) incorporation; changes in cell abundances) and O2 consumption was investigated in a shallow subtropical coral reef located near Key Largo, Florida. Quartz (light) and opaque (dark) ...
Marine bacterioplankton community turnover within seasonally hypoxic waters of a subtropical sound: Devil's Hole, Bermuda.

PubMed

Parsons, Rachel J; Nelson, Craig E; Carlson, Craig A; Denman, Carmen C; Andersson, Andreas J; Kledzik, Andrew L; Vergin, Kevin L; McNally, Sean P; Treusch, Alexander H; Giovannoni, Stephen J

2015-10-01

Understanding bacterioplankton community dynamics in coastal hypoxic environments is relevant to global biogeochemistry because coastal hypoxia is increasing worldwide. The temporal dynamics of bacterioplankton communities were analysed throughout the illuminated water column of Devil's Hole, Bermuda during the 6-week annual transition from a strongly stratified water column with suboxic and high-pCO2 bottom waters to a fully mixed and ventilated state during 2008. A suite of culture-independent methods provided a quantitative spatiotemporal characterization of bacterioplankton community changes, including both direct counts and rRNA gene sequencing. During stratification, the surface waters were dominated by the SAR11 clade of Alphaproteobacteria and the cyanobacterium Synechococcus. In the suboxic bottom waters, cells from the order Chlorobiales prevailed, with gene sequences indicating members of the genera Chlorobium and Prosthecochloris--anoxygenic photoautotrophs that utilize sulfide as a source of electrons for photosynthesis. Transitional zones of hypoxia also exhibited elevated levels of methane- and sulfur-oxidizing bacteria relative to the overlying waters. The abundance of both Thaumarcheota and Euryarcheota were elevated in the suboxic bottom waters (> 10(9) cells l(-1)). Following convective mixing, the entire water column returned to a community typical of oxygenated waters, with Euryarcheota only averaging 5% of cells, and Chlorobiales and Thaumarcheota absent. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.

Effect of signal compounds and incubation conditions on the culturability of freshwater bacterioplankton.

PubMed

Bruns, Alke; Nübel, Ulrich; Cypionka, Heribert; Overmann, Jörg

2003-04-01

The effect of signal compounds and of different incubation conditions on the culturability (i.e., the fraction of all cells capable of growth) of natural bacterioplankton from the eutrophic lake Zwischenahner Meer was investigated over a period of 20 months. Numbers of growing cells were determined by the most-probable-number technique in liquid media containing low concentrations (10 micro M) of the signal compounds N-(oxohexanoyl)-DL-homoserine lactone, N-(butyryl)-DL-homoserine lactone, cyclic AMP (cAMP), or ATP. cAMP was the most effective signal compound, leading to significantly increased cultivation efficiencies of up to 10% of the total bacterial counts. Microautoradiography with [2,8-(3)H]cAMP, combined with fluorescence in situ hybridization, demonstrated that cAMP was taken up by 18% of all cells. The bacterial cAMP uptake systems had a very low K(m) value of bacterioplankton assemblage. Sequence comparison revealed that two members of the Actinomycetales which reached high numbers in the natural bacterioplankton assemblage could actually be enriched by our cultivation approach.
BACTERIOPLANKTON DYNAMICS IN PENSACOLA BAY, FL, USA: ROLE OF PHYTOPLANKTON AND DETRIAL CARBON SOURCES

EPA Science Inventory

Bacterioplankton Dynamics in Pensacola Bay, FL, USA: Role of Phytoplankton and Detrital Carbon Sources (Abstract). To be presented at the16th Biennial Conference of the Estuarine Research Foundation, ERF 2001: An Estuarine Odyssey, 4-8 November 2001, St. Pete Beach, FL. 1 p. (ER...
Inferring human population size and separation history from multiple genome sequences.

PubMed

Schiffels, Stephan; Durbin, Richard

2014-08-01

The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.
Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes.

PubMed

Shimizu, Tokurou; Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

2016-01-01

Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy-Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies.
Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes

PubMed Central

Kitajima, Akira; Nonaka, Keisuke; Yoshioka, Terutaka; Ohta, Satoshi; Goto, Shingo; Toyoda, Atsushi; Fujiyama, Asao; Mochizuki, Takako; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu

2016-01-01

Most indigenous citrus varieties are assumed to be natural hybrids, but their parentage has so far been determined in only a few cases because of their wide genetic diversity and the low transferability of DNA markers. Here we infer the parentage of indigenous citrus varieties using simple sequence repeat and indel markers developed from various citrus genome sequence resources. Parentage tests with 122 known hybrids using the selected DNA markers certify their transferability among those hybrids. Identity tests confirm that most variant strains are selected mutants, but we find four types of kunenbo (Citrus nobilis) and three types of tachibana (Citrus tachibana) for which we suggest different origins. Structure analysis with DNA markers that are in Hardy–Weinberg equilibrium deduce three basic taxa coinciding with the current understanding of citrus ancestors. Genotyping analysis of 101 indigenous citrus varieties with 123 selected DNA markers infers the parentages of 22 indigenous citrus varieties including Satsuma, Temple, and iyo, and single parents of 45 indigenous citrus varieties, including kunenbo, C. ichangensis, and Ichang lemon by allele-sharing and parentage tests. Genotyping analysis of chloroplast and mitochondrial genomes using 11 DNA markers classifies their cytoplasmic genotypes into 18 categories and deduces the combination of seed and pollen parents. Likelihood ratio analysis verifies the inferred parentages with significant scores. The reconstructed genealogy identifies 12 types of varieties consisting of Kishu, kunenbo, yuzu, koji, sour orange, dancy, kobeni mikan, sweet orange, tachibana, Cleopatra, willowleaf mandarin, and pummelo, which have played pivotal roles in the occurrence of these indigenous varieties. The inferred parentage of the indigenous varieties confirms their hybrid origins, as found by recent studies. PMID:27902727
Low-Pass Genome-Wide Sequencing and Variant Inference Using Identity-by-Descent in an Isolated Human Population

PubMed Central

Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.

2012-01-01

Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348
Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

PubMed Central

2012-01-01

Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433
BACTERIOPLANKTON DYNAMICS IN NORTHERN SAN FRANCISCO BAY: ROLE OF PARTICLE ASSOCIATION AND SEASONAL FRESHWATER FLOW

EPA Science Inventory

Bacterioplankton abundance and metabolic characteristics were observed in northern San Francisco Bay, California, during spring and summer 1996 at three sites: Central Bay, Suisun Bay, and the Sacramento River. These sites spanned a salinity gradient from marine to freshwater, an...
Paleolithic Contingent in Modern Japanese: Estimation and Inference using Genome-wide Data

PubMed Central

He, Yungang; Wang, Wei R.; Xu, Shuhua; Jin, Li; SNP Consortium, Pan-Asia

2012-01-01

The genetic origins of Japanese populations have been controversial. Upper Paleolithic Japanese, i.e. Jomon, developed independently in Japanese islands for more than 10,000 years until the isolation was ended with the influxes of continental immigrants about 2,000 years ago. However, the knowledge of origin of Jomon and its contribution to the genetic pool of contemporary Japanese is still limited, albeit the extensive studies using mtDNA and Y chromosomes. In this report, we aimed to infer the origin of Jomon and to estimate its contribution to Japanese by fitting an admixture model with missing data from Jomon to a genome-wide data from 94 worldwide populations. Our results showed that the genetic contributions of Jomon, the Paleolithic contingent in Japanese, are 54.3∼62.3% in Ryukyuans and 23.1∼39.5% in mainland Japanese, respectively. Utilizing inferred allele frequencies of the Jomon population, we further showed the Paleolithic contingent in Japanese had a Northeast Asia origin. PMID:22482036
The Diversity of the Limnohabitans Genus, an Important Group of Freshwater Bacterioplankton, by Characterization of 35 Isolated Strains

PubMed Central

Kasalický, Vojtěch; Jezbera, Jan; Hahn, Martin W.; Šimek, Karel

2013-01-01

Bacteria of the genus Limnohabitans, more precisely the R-BT lineage, have a prominent role in freshwater bacterioplankton communities due to their high rates of substrate uptake and growth, growth on algal-derived substrates and high mortality rates from bacterivory. Moreover, due to their generally larger mean cell volume, compared to typical bacterioplankton cells, they contribute over-proportionally to total bacterioplankton biomass. Here we present genetic, morphological and ecophysiological properties of 35 bacterial strains affiliated with the Limnohabitans genus newly isolated from 11 non-acidic European freshwater habitats. The low genetic diversity indicated by the previous studies using the ribosomal SSU gene highly contrasted with the surprisingly rich morphologies and different patterns in substrate utilization of isolated strains. Therefore, the intergenic spacer between 16S and 23S rRNA genes was successfully tested as a fine-scale marker to delineate individual lineages and even genotypes. For further studies, we propose the division of the Limnohabitans genus into five lineages (provisionally named as LimA, LimB, LimC, LimD and LimE) and also additional sublineages within the most diversified lineage LimC. Such a delineation is supported by the morphology of isolated strains which predetermine large differences in their ecology. PMID:23505469
Distribution of bacterioplankton with active metabolism in waters of the St. Anna Trough, Kara Sea, in autumn 2011

NASA Astrophysics Data System (ADS)

Mosharova, I. V.; Mosharov, S. A.; Ilinskiy, V. V.

2017-01-01

The distribution of bacterioplankton with active electron transport chains, as well as bacteria with intact cell membranes, was investigated for the first time in the region of St. Anna Trough in the Kara Sea. The average number of bacteria with active electron transport chains in the waters of the St. Anna Trough was 15.55 × 103 cells mL-1 (the limits of variation were 1.06-92.17 × 103 cells mL-1). The average number of bacteria with intact membranes was 33.46 × 103 cells mL-1 (the limits of variation were 6.78 to 103.18 × 103 cells mL-1). Almost all bacterioplankton microorganisms in the studied area were potentially viable, and the average share of bacteria with intact membranes was 92.1% of the total number of bacterioplankton (TNB) (the limits of variation were 76.2 to 98.4%). The share of bacteria with active metabolisms was 38.2% of the TNB (the limits of variation were 5.6-93.4%). The shares of the bacteria with active metabolisms were maximum in areas with the most stable environmental conditions (on the shelf and in deep water), whereas on the slope, where the gradients of water temperature and salinity were maximum, these values were lower.
Influence of macrophyte decomposition on growth rate and community structure of okefenokee swamp bacterioplankton.

PubMed

Murray, R E; Hodson, R E

1986-02-01

Dissolved substances released during decomposition of the white water lily (Nymphaea odorata) can alter the growth rate of Okefenokee Swamp bacterioplankton. In microcosm experiments dissolved compounds released from senescent Nymphaea leaves caused a transient reduction in the abundance and activity of water column bacterioplankton, followed by a period of intense bacterial growth. Rates of [H]thymidine incorporation and turnover of dissolved d-glucose were depressed by over 85%, 3 h after the addition of Nymphaea leachates to microcosms containing Okefenokee Swamp water. Bacterial activity subsequently recovered; after 20 h [H]thymidine incorporation in leachate-treated microcosms was 10-fold greater than that in control microcosms. The recovery of activity was due to a shift in the composition of the bacterial population toward resistance to the inhibitory compounds present in Nymphaea leachates. Inhibitory compounds released during the decomposition of aquatic macrophytes thus act as selective agents which alter the community structure of the bacterial population with respect to leachate resistance. Soluble compounds derived from macrophyte decomposition influence the rate of bacterial secondary production and the availability of microbial biomass to microconsumers.
Influence of macrophyte decomposition on growth rate and community structure of Okefenokee Swamp bacterioplankton

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murray, R.E.; Hodson, R.E.

1986-02-01

Dissolved substances released during decomposition of the white water lily (Nymphaea odorata) can alter the growth rate of Okefenokee Swamp bacterioplankton. In microcosm experiments dissolved compounds released bacterioplankton, followed by a period of intense bacterial growth. Rates of (/sup 3/H)thymidine incorporation and turnover of dissolved D-glucose were depressed by over 85%, 3 h after the addition of Nymphaea leachates to microcosms containing Okefenokee Swamp water. Bacterial activity subsequently recovered; after 20 h (/sup 3/H)thymidine incorporation in leachate-treated microcosms was 10-fold greater than that in control microcosms. The recovery of activity was due to a shift in the composition of themore » bacterial population toward resistance to the inhibitory compounds present in Nymphaea leachates. Inhibitory compounds released during the decomposition of aquatic macrophytes thus act as selective agents which alter the community structure of the bacterial population with respect to leachate resistance. Soluble compounds derived from macrophyte decomposition influence the rate of bacterial secondary production and the availability of microbial biomass to microconsumers.« less
Seasonality Affects the Diversity and Composition of Bacterioplankton Communities in Dongjiang River, a Drinking Water Source of Hong Kong.

PubMed

Sun, Wei; Xia, Chunyu; Xu, Meiying; Guo, Jun; Sun, Guoping

2017-01-01

Water quality ranks the most vital criterion for rivers serving as drinking water sources, which periodically changes over seasons. Such fluctuation is believed associated with the state shifts of bacterial community within. To date, seasonality effects on bacterioplankton community patterns in large rivers serving as drinking water sources however, are still poorly understood. Here we investigated the intra-annual bacterial community structure in the Dongjiang River, a drinking water source of Hong Kong, using high-throughput pyrosequencing in concert with geochemical property measurements during dry, and wet seasons. Our results showed that Proteobacteria, Actinobacteria , and Bacteroidetes were the dominant phyla of bacterioplankton communities, which varied in composition, and distribution from dry to wet seasons, and exhibited profound seasonal changes. Actinobacteria, Bacteroidetes , and Cyanobacteria seemed to be more associated with seasonality that the relative abundances of Actinobacteria , and Bacteroidetes were significantly higher in the dry season than those in the wet season ( p < 0.01), while the relative abundance of Cyanobacteria was about 10-fold higher in the wet season than in the dry season. Temperature and [Formula: see text]-N concentration represented key contributing factors to the observed seasonal variations. These findings help understand the roles of various bacterioplankton and their interactions with the biogeochemical processes in the river ecosystem.
Response of marine bacterioplankton pH homeostasis gene expression to elevated CO2

NASA Astrophysics Data System (ADS)

Bunse, Carina; Lundin, Daniel; Karlsson, Christofer M. G.; Akram, Neelam; Vila-Costa, Maria; Palovaara, Joakim; Svensson, Lovisa; Holmfeldt, Karin; González, José M.; Calvo, Eva; Pelejero, Carles; Marrasé, Cèlia; Dopson, Mark; Gasol, Josep M.; Pinhassi, Jarone

2016-05-01

Human-induced ocean acidification impacts marine life. Marine bacteria are major drivers of biogeochemical nutrient cycles and energy fluxes; hence, understanding their performance under projected climate change scenarios is crucial for assessing ecosystem functioning. Whereas genetic and physiological responses of phytoplankton to ocean acidification are being disentangled, corresponding functional responses of bacterioplankton to pH reduction from elevated CO2 are essentially unknown. Here we show, from metatranscriptome analyses of a phytoplankton bloom mesocosm experiment, that marine bacteria responded to lowered pH by enhancing the expression of genes encoding proton pumps, such as respiration complexes, proteorhodopsin and membrane transporters. Moreover, taxonomic transcript analysis showed that distinct bacterial groups expressed different pH homeostasis genes in response to elevated CO2. These responses were substantial for numerous pH homeostasis genes under low-chlorophyll conditions (chlorophyll a <2.5 μg l-1) however, the changes in gene expression under high-chlorophyll conditions (chlorophyll a >20 μg l-1) were low. Given that proton expulsion through pH homeostasis mechanisms is energetically costly, these findings suggest that bacterioplankton adaptation to ocean acidification could have long-term effects on the economy of ocean ecosystems.
From algae to angiosperms–inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes

PubMed Central

2014-01-01

Background Next-generation sequencing has provided a wealth of plastid genome sequence data from an increasingly diverse set of green plants (Viridiplantae). Although these data have helped resolve the phylogeny of numerous clades (e.g., green algae, angiosperms, and gymnosperms), their utility for inferring relationships across all green plants is uncertain. Viridiplantae originated 700-1500 million years ago and may comprise as many as 500,000 species. This clade represents a major source of photosynthetic carbon and contains an immense diversity of life forms, including some of the smallest and largest eukaryotes. Here we explore the limits and challenges of inferring a comprehensive green plant phylogeny from available complete or nearly complete plastid genome sequence data. Results We assembled protein-coding sequence data for 78 genes from 360 diverse green plant taxa with complete or nearly complete plastid genome sequences available from GenBank. Phylogenetic analyses of the plastid data recovered well-supported backbone relationships and strong support for relationships that were not observed in previous analyses of major subclades within Viridiplantae. However, there also is evidence of systematic error in some analyses. In several instances we obtained strongly supported but conflicting topologies from analyses of nucleotides versus amino acid characters, and the considerable variation in GC content among lineages and within single genomes affected the phylogenetic placement of several taxa. Conclusions Analyses of the plastid sequence data recovered a strongly supported framework of relationships for green plants. This framework includes: i) the placement of Zygnematophyceace as sister to land plants (Embryophyta), ii) a clade of extant gymnosperms (Acrogymnospermae) with cycads + Ginkgo sister to remaining extant gymnosperms and with gnetophytes (Gnetophyta) sister to non-Pinaceae conifers (Gnecup trees), and iii) within the monilophyte clade
Bacterioplankton in antarctic ocean waters during late austral winter: abundance, frequency of dividing cells, and estimates of production.

PubMed

Hanson, R B; Shafer, D; Ryan, T; Pope, D H; Lowery, H K

1983-05-01

Bacterioplankton productivity in Antarctic waters of the eastern South Pacific Ocean and Drake Passage was estimated by direct counts and frequency of dividing cells (FDC). Total bacterioplankton assemblages were enumerated by epifluorescent microscopy. The experimentally determined relationship between in situ FDC and the potential instantaneous growth rate constant (mu) is best described by the regression equation ln mu = 0.081 FDC - 3.73. In the eastern South Pacific Ocean, bacterioplankton abundance (2 x 10 to 3.5 x 10 cells per ml) and FDC (11%) were highest at the Polar Front (Antarctic Convergence). North of the Subantarctic Front, abundance and FDC were between 1 x 10 to 2 x 10 cells per ml and 3 to 5%, respectively, and were vertically homogeneous to a depth of 600 m. In Drake Passage, abundance (10 x 10 cells per ml) and FDC (16%) were highest in waters south of the Polar Front and near the sea ice. Subantarctic waters in Drake Passage contained 4 x 10 cells per ml with 4 to 5% FDC. Instantaneous growth rate constants ranged between 0.029 and 0.088 h. Using estimates of potential mu and measured standing stocks, we estimated productivity to range from 0.62 mug of C per liter . day in the eastern South Pacific Ocean to 17.1 mug of C per liter . day in the Drake Passage near the sea ice.
Proteomic-based stable isotope probing reveals taxonomically Distinct Patterns in Amino Acid Assimilation by Coastal Marine Bacterioplankton

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bryson, Samuel; Li, Zhou; Pett-Ridge, Jennifer

Heterotrophic marine bacterioplankton are a critical component of the carbon cycle, processing nearly a quarter of annual global primary production, yet defining how substrate utilization preferences and resource partitioning structure these microbial communities remains a challenge. In this study, we utilized proteomics-based stable isotope probing (proteomic SIP) to characterize the assimilation of amino acids by coastal marine bacterioplankton populations. We incubated microcosms of seawater collected from Newport, OR and Monterey Bay, CA with 1 M 13C-amino acids for 15 and 32 hours. Subsequent analysis of 13C incorporation into protein biomass quantified the frequency and extent of isotope enrichment for identifiedmore » proteins. Using these metrics we tested whether amino acid assimilation patterns were different for specific bacterioplankton populations. Proteins associated with Rhodobacterales and Alteromonadales tended to have a significantly high number of tandem mass spectra from 13C-enriched peptides, while Flavobacteriales and SAR11 proteins generally had significantly low numbers of 13C-enriched spectra. Rhodobacterales proteins associated with amino acid transport and metabolism had an increased frequency of 13C-enriched spectra at time-point 2, while Alteromonadales ribosomal proteins were 13C- enriched across time-points. Overall, proteomic SIP facilitated quantitative comparisons of dissolved free amino acids assimilation by specific taxa, both between sympatric populations and between protein functional groups within discrete populations, allowing an unprecedented examination of population-level metabolic responses to resource acquisition in complex microbial communities.« less
Proteomic-based stable isotope probing reveals taxonomically Distinct Patterns in Amino Acid Assimilation by Coastal Marine Bacterioplankton

DOE PAGES

Bryson, Samuel; Li, Zhou; Pett-Ridge, Jennifer; ...

2016-04-26

Heterotrophic marine bacterioplankton are a critical component of the carbon cycle, processing nearly a quarter of annual global primary production, yet defining how substrate utilization preferences and resource partitioning structure these microbial communities remains a challenge. In this study, we utilized proteomics-based stable isotope probing (proteomic SIP) to characterize the assimilation of amino acids by coastal marine bacterioplankton populations. We incubated microcosms of seawater collected from Newport, OR and Monterey Bay, CA with 1 M 13C-amino acids for 15 and 32 hours. Subsequent analysis of 13C incorporation into protein biomass quantified the frequency and extent of isotope enrichment for identifiedmore » proteins. Using these metrics we tested whether amino acid assimilation patterns were different for specific bacterioplankton populations. Proteins associated with Rhodobacterales and Alteromonadales tended to have a significantly high number of tandem mass spectra from 13C-enriched peptides, while Flavobacteriales and SAR11 proteins generally had significantly low numbers of 13C-enriched spectra. Rhodobacterales proteins associated with amino acid transport and metabolism had an increased frequency of 13C-enriched spectra at time-point 2, while Alteromonadales ribosomal proteins were 13C- enriched across time-points. Overall, proteomic SIP facilitated quantitative comparisons of dissolved free amino acids assimilation by specific taxa, both between sympatric populations and between protein functional groups within discrete populations, allowing an unprecedented examination of population-level metabolic responses to resource acquisition in complex microbial communities.« less
Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.

PubMed

Cun, Yupeng; Yang, Tsun-Po; Achter, Viktor; Lang, Ulrich; Peifer, Martin

2018-06-01

The genomes of cancer cells constantly change during pathogenesis. This evolutionary process can lead to the emergence of drug-resistant mutations in subclonal populations, which can hinder therapeutic intervention in patients. Data derived from massively parallel sequencing can be used to infer these subclonal populations using tumor-specific point mutations. The accurate determination of copy-number changes and tumor impurity is necessary to reliably infer subclonal populations by mutational clustering. This protocol describes how to use Sclust, a copy-number analysis method with a recently developed mutational clustering approach. In a series of simulations and comparisons with alternative methods, we have previously shown that Sclust accurately determines copy-number states and subclonal populations. Performance tests show that the method is computationally efficient, with copy-number analysis and mutational clustering taking <10 min. Sclust is designed such that even non-experts in computational biology or bioinformatics with basic knowledge of the Linux/Unix command-line syntax should be able to carry out analyses of subclonal populations.

Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data.

PubMed

Bhaskar, Anand; Wang, Y X Rachel; Song, Yun S

2015-02-01

With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. © 2015 Bhaskar et al.; Published by Cold Spring Harbor Laboratory Press.
LASSIM-A network inference toolbox for genome-wide mechanistic modeling.

PubMed

Magnusson, Rasmus; Mariotti, Guido Pio; Köpsén, Mattias; Lövfors, William; Gawel, Danuta R; Jörnsten, Rebecka; Linde, Jörg; Nordling, Torbjörn E M; Nyman, Elin; Schulze, Sylvie; Nestor, Colm E; Zhang, Huan; Cedersund, Gunnar; Benson, Mikael; Tjärnberg, Andreas; Gustafsson, Mika

2017-06-01

systems-level data. We demonstrate the power of this approach by inferring a mechanistically motivated, genome-wide model of the Th2 transcription regulatory system, which plays an important role in several immune related diseases.
Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

PubMed

Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

2018-03-01

Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
Network inference and network response identification: moving genome-scale data to the next level of biological discovery

PubMed Central

Veiga, Diogo F. T.; Dutta, Bhaskar; Balaźsi, Gábor

2011-01-01

The escalating amount of genome-scale data demands a pragmatic stance from the research community. How can we utilize this deluge of information to better understand biology, cure diseases, or engage cells in bioremediation or biomaterial production for various purposes? A research pipeline moving new sequence, expression and binding data towards practical end goals seems to be necessary. While most individual researchers are not motivated by such well-articulated pragmatic end goals, the scientific community has already self-organized itself to successfully convert genomic data into fundamentally new biological knowledge and practical applications. Here we review two important steps in this workflow: network inference and network response identification, applied to transcriptional regulatory networks. Among network inference methods, we concentrate on relevance networks due to their conceptual simplicity. We classify and discuss network response identification approaches as either data-centric or network-centric. Finally, we conclude with an outlook on what is still missing from these approaches and what may be ahead on the road to biological discovery. PMID:20174676
Deep Learning for Population Genetic Inference.

PubMed

Sheehan, Sara; Song, Yun S

2016-03-01

Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Phylogenetic shifts of bacterioplankton community composition along the Pearl Estuary: the potential impact of hypoxia and nutrients

PubMed Central

Liu, Jiwen; Fu, Bingbing; Yang, Hongmei; Zhao, Meixun; He, Biyan; Zhang, Xiao-Hua

2015-01-01

The significance of salinity in shaping bacterial communities dwelling in estuarine areas has been well documented. However, the influences of other environmental factors such as dissolved oxygen and nutrients in determining distribution patterns of both individual taxa and bacterial communities inhabited local estuarine regions remain elusive. Here, bacterioplankton community structures of surface and bottom waters from eight sites along the Pearl Estuary were characterized with 16S rRNA gene pyrosequencing. The results showed significant differences of bacterioplankton community between freshwater and saltwater sites, and further between surface and bottom waters of saltwater sites. Synechococcus dominated the surface water of saltwater sites while Oceanospirillales, SAR11 and SAR406 were prevalent in the bottom water. Betaproteobacteria was abundant in freshwater sites, with no significant difference between water layers. Occurrence of phylogenetic shifts in taxa affiliated to the same clade was also detected. Dissolved oxygen explained most of the bacterial community variation in the redundancy analysis targeting only freshwater sites, whereas nutrients and salinity explained most of the variation across all samples in the Pearl Estuary. Methylophilales (mainly PE2 clade) was positively correlated to dissolved oxygen, whereas Rhodocyclales (mainly R.12up clade) was negatively correlated. Moreover, high nutrient inputs to the freshwater area of the Pearl Estuary have shifted the bacterial communities toward copiotrophic groups, such as Sphingomonadales. The present study demonstrated that the overall nutrients and freshwater hypoxia play important roles in determining bacterioplankton compositions and provided insights into the potential ecological roles of specific taxa in estuarine environments. PMID:25713564
Seasonal dynamics of bacterioplankton community in a large, shallow, highly dynamic freshwater lake.

PubMed

Kong, Zhaoyu; Kou, Wenbo; Ma, Yantian; Yu, Haotian; Ge, Gang; Wu, Lan

2018-05-23

The spatio-temporal shifts of bacterioplankton community can mirror their transition of functional traits in aquatic ecosystem. However, our understanding of spatio-temporal variation of bacterioplankton community composition structure (BCCs) within large, shallow and highly dynamic freshwater lake is still elusive. Here we examined the seasonal and spatial variability of BCCs in the Poyang Lake by 16S rRNA gene amplicon sequencing to explore how hydrological changes affect the BCCs. Principal coordinate analysis showed that the BCCs varied significantly among four sampling seasons, but not spatially. The seasonal changes of BCCs were mainly attributed to the differences between autumn and spring/winter. Higher alpha diversity indices were observed in autumn. Redundancy analysis indicated that the BCCs co-variated with water level, pH, temperature, total phosphorus, ammoniacal nitrogen, electrical conductivity, total nitrogen, and turbidity. Among them, water level was the key determinant separating autumn BCCs from the BCCs in other seasons. A significant lower relative abundance of Burkholderiales (betI and betVII) and a higher relative abundance of Actinomycetales (acI, acTH1 and acTH2) were found in autumn than in other seasons. Overall, our results suggest that water level changes associated with pH, temperature and nutrient status shaped the seasonal patterns of BCCs in the Poyang Lake.
A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

PubMed Central

Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.

2011-01-01

Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion

PubMed Central

Padhukasahasram, Badri; Rannala, Bruce

2011-01-01

Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Deep Learning for Population Genetic Inference

PubMed Central

Sheehan, Sara; Song, Yun S.

2016-01-01

Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Free-Living and Particle-Associated Bacterioplankton in Large Rivers of the Mississippi River Basin Demonstrate Biogeographic Patterns

PubMed Central

Millar, Justin J.; Payne, Jason T.; Ochs, Clifford A.

2014-01-01

The different drainage basins of large rivers such as the Mississippi River represent interesting systems in which to study patterns in freshwater microbial biogeography. Spatial variability in bacterioplankton communities in six major rivers (the Upper Mississippi, Missouri, Illinois, Ohio, Tennessee, and Arkansas) of the Mississippi River Basin was characterized using Ion Torrent 16S rRNA amplicon sequencing. When all systems were combined, particle-associated (>3 μm) bacterial assemblages were found to be different from free-living bacterioplankton in terms of overall community structure, partly because of differences in the proportional abundance of sequences affiliated with major bacterial lineages (Alphaproteobacteria, Cyanobacteria, and Planctomycetes). Both particle-associated and free-living communities ordinated by river system, a pattern that was apparent even after rare sequences or those affiliated with Cyanobacteria were removed from the analyses. Ordination of samples by river system correlated with environmental characteristics of each river, such as nutrient status and turbidity. Communities in the Upper Mississippi and the Missouri and in the Ohio and the Tennessee, pairs of rivers that join each other, contained similar taxa in terms of presence-absence data but differed in the proportional abundance of major lineages. The most common sequence types detected in particle-associated communities were picocyanobacteria in the Synechococcus/Prochlorococcus/Cyanobium (Syn/Pro) clade, while free-living communities also contained a high proportion of LD12 (SAR11/Pelagibacter)-like Alphaproteobacteria. This research shows that while different tributaries of large river systems such as the Mississippi River harbor distinct bacterioplankton communities, there is also microhabitat variation such as that between free-living and particle-associated assemblages. PMID:25217018
Analysis of Composition and Structure of Coastal to Mesopelagic Bacterioplankton Communities in the Northern Gulf of Mexico

PubMed Central

King, Gary M.; Smith, Conor B.; Tolar, Bradley; Hollibaugh, James T.

2013-01-01

16S rRNA gene amplicons were pyrosequenced to assess bacterioplankton community composition, diversity, and phylogenetic community structure for 17 stations in the northern Gulf of Mexico (nGoM) sampled in March 2010. Statistical analyses showed that samples from depths ≤100 m differed distinctly from deeper samples. SAR 11 α-Proteobacteria and Bacteroidetes dominated communities at depths ≤100 m, which were characterized by high α-Proteobacteria/γ-Proteobacteria ratios (α/γ > 1.7). Thaumarchaeota, Firmicutes, and δ-Proteobacteria were relatively abundant in deeper waters, and α/γ ratios were low (<1). Canonical correlation analysis indicated that δ- and γ-Proteobacteria, Thaumarchaeota, and Firmicutes correlated positively with depth; α-Proteobacteria and Bacteroidetes correlated positively with temperature and dissolved oxygen; Actinobacteria, β-Proteobacteria, and Verrucomicrobia correlated positively with a measure of suspended particles. Diversity indices did not vary with depth or other factors, which indicated that richness and evenness elements of bacterioplankton communities might develop independently of nGoM physical-chemical variables. Phylogenetic community structure as measured by the net relatedness (NRI) and nearest taxon (NTI) indices also did not vary with depth. NRI values indicated that most of the communities were comprised of OTUs more distantly related to each other in whole community comparisons than expected by chance. NTI values derived from phylogenetic distances of the closest neighbor for each OTU in a given community indicated that OTUs tended to occur in clusters to a greater extent than expected by chance. This indicates that “habitat filtering” might play an important role in nGoM bacterioplankton species assembly, and that such filtering occurs throughout the water column. PMID:23346078
Flow Sorting of Marine Bacterioplankton after Fluorescence In Situ Hybridization

PubMed Central

Sekar, Raju; Fuchs, Bernhard M.; Amann, Rudolf; Pernthaler, Jakob

2004-01-01

We describe an approach to sort cells from coastal North Sea bacterioplankton by flow cytometry after in situ hybridization with rRNA-targeted horseradish peroxidase-labeled oligonucleotide probes and catalyzed fluorescent reporter deposition (CARD-FISH). In a sample from spring 2003 >90% of the cells were detected by CARD-FISH with a bacterial probe (EUB338). Approximately 30% of the microbial assemblage was affiliated with the Cytophaga-Flavobacterium lineage of the Bacteroidetes (CFB group) (probe CF319a), and almost 10% was targeted by a probe for the β-proteobacteria (probe BET42a). A protocol was optimized to detach cells hybridized with EUB338, BET42a, and CF319a from membrane filters (recovery rate, 70%) and to sort the cells by flow cytometry. The purity of sorted cells was >95%. 16S rRNA gene clone libraries were constructed from hybridized and sorted cells (S-EUB, S-BET, and S-CF libraries) and from unhybridized and unsorted cells (UNHYB library). Sequences related to the CFB group were significantly more frequent in the S-CF library (66%) than in the UNHYB library (13%). No enrichment of β-proteobacterial sequence types was found in the S-BET library, but novel sequences related to Nitrosospira were found exclusively in this library. These bacteria, together with members of marine clade OM43, represented >90% of the β-proteobacteria in the water sample, as determined by CARD-FISH with specific probes. This illustrates that a combination of CARD-FISH and flow sorting might be a powerful approach to study the diversity and potentially the activity and the genomes of different bacterial populations in aquatic habitats. PMID:15466568
Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data

PubMed Central

2017-01-01

Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014
Diversity in UV sensitivity and recovery potential among bacterioneuston and bacterioplankton isolates.

PubMed

Santos, A L; Lopes, S; Baptista, I; Henriques, I; Gomes, N C M; Almeida, A; Correia, A; Cunha, A

2011-04-01

To assess the variability in UV-B (280-320 nm) sensitivity of selected bacterial isolates from the surface microlayer and underlying water of the Ria de Aveiro (Portugal) estuary and their ability to recover from previous UV-induced stress. Bacterial suspensions were exposed to UV-B radiation (3·3 W m⁻²). Effects on culturability and activity were assessed from colony counts and (3) H-leucine incorporation rates, respectively. Among the tested isolates, wide variability in UV-B-induced inhibition of culturability (37·4-99·3%) and activity (36·0-98·0%) was observed. Incubation of UV-B-irradiated suspensions under reactivating regimes (UV-A, 3·65 W m⁻²; photosynthetic active radiation, 40 W m⁻²; dark) also revealed diversity in the extent of recovery from UV-B stress. Trends of enhanced resistance of culturability (up to 15·0%) and enhanced recovery in activity (up to 52·0%) were observed in bacterioneuston isolates. Bacterioneuston isolates were less sensitive and recovered more rapidly from UV-B stress than bacterioplankton isolates, showing enhanced reduction in their metabolism during the irradiation period and decreased culturability during the recovery process compared to bacterioplankton. UV exposure can affect the diversity and activity of microbial communities by selecting UV-resistant strains and alter their metabolic activity towards protective strategies. © 2011 The Authors. Letters in Applied Microbiology © 2011 The Society for Applied Microbiology.
Influence of Macrophyte Decomposition on Growth Rate and Community Structure of Okefenokee Swamp Bacterioplankton †

PubMed Central

Murray, Robert E.; Hodson, Robert E.

1986-01-01

Dissolved substances released during decomposition of the white water lily (Nymphaea odorata) can alter the growth rate of Okefenokee Swamp bacterioplankton. In microcosm experiments dissolved compounds released from senescent Nymphaea leaves caused a transient reduction in the abundance and activity of water column bacterioplankton, followed by a period of intense bacterial growth. Rates of [3H]thymidine incorporation and turnover of dissolved d-glucose were depressed by over 85%, 3 h after the addition of Nymphaea leachates to microcosms containing Okefenokee Swamp water. Bacterial activity subsequently recovered; after 20 h [3H]thymidine incorporation in leachate-treated microcosms was 10-fold greater than that in control microcosms. The recovery of activity was due to a shift in the composition of the bacterial population toward resistance to the inhibitory compounds present in Nymphaea leachates. Inhibitory compounds released during the decomposition of aquatic macrophytes thus act as selective agents which alter the community structure of the bacterial population with respect to leachate resistance. Soluble compounds derived from macrophyte decomposition influence the rate of bacterial secondary production and the availability of microbial biomass to microconsumers. Images PMID:16346986
Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

PubMed

Arenas, Miguel

2015-04-01

NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
Contrasting patterns of free-living bacterioplankton diversity in macrophyte-dominated versus phytoplankton blooming regimes in Dianchi Lake, a shallow lake in China

NASA Astrophysics Data System (ADS)

Wang, Yujing; Li, Huabing; Xing, Peng; Wu, Qinglong

2017-03-01

Freshwater shallow lakes typically exhibit two alternative stable states under certain nutrient loadings: macrophyte-dominated and phytoplankton-dominated water regimes. An ecosystem regime shift from macrophytes to phytoplankton blooming typically reduces the number of species of invertebrates and fishes and results in the homogenization of communities in freshwater lakes. We investigated how microbial biodiversity has responded to a shift of the ecosystem regime in Dianchi Lake, which was previously fully covered with submerged macrophytes but currently harbors both ecological states. We observed marked divergence in the diversity and community composition of bacterioplankton between the two regimes. Although species richness, estimated as the number of operational taxonomic units and phylogenetic diversity (PD), was higher in the phytoplankton dominated ecosystem after this shift, the dissimilarity of bacterioplankton community across space decreased. This decrease in beta diversity was accompanied by loss of planktonic bacteria unique to the macrophyte-dominated ecosystem. Mantel tests between bacterioplankton community distances and Euclidian distance of environmental parameters indicated that this reduced bacterial community differentiation primarily reflected the loss of environmental niches, particularly in the macrophyte regime. The loss of this small-scale heterogeneity in bacterial communities should be considered when assessing long-term biodiversity changes in response to ecosystem regime conversions in freshwater lakes.
Single-cell genomics-based analysis of virus–host interactions in marine surface bacterioplankton

DOE PAGES

Labonté, Jessica M.; Swan, Brandon K.; Poulos, Bonnie; ...

2015-04-07

Viral infections dynamically alter the composition and metabolic potential of marine microbial communities and the evolutionary trajectories of host populations with resulting feedback on biogeochemical cycles. It is quite possible that all microbial populations in the ocean are impacted by viral infections. Our knowledge of virus–host relationships, however, has been limited to a minute fraction of cultivated host groups. Here, we utilized single-cell sequencing to obtain genomic blueprints of viruses inside or attached to individual bacterial and archaeal cells captured in their native environment, circumventing the need for host and virus cultivation. Furthermore, a combination of comparative genomics, metagenomic fragmentmore » recruitment, sequence anomalies and irregularities in sequence coverage depth and genome recovery were utilized to detect viruses and to decipher modes of virus–host interactions. Members of all three tailed phage families were identified in 20 out of 58 phylogenetically and geographically diverse single amplified genomes (SAGs) of marine bacteria and archaea. At least four phage–host interactions had the characteristics of late lytic infections, all of which were found in metabolically active cells. One virus had genetic potential for lysogeny. Our findings include first known viruses of Thaumarchaeota, Marinimicrobia, Verrucomicrobia and Gammaproteobacteria clusters SAR86 and SAR92. Viruses were also found in SAGs of Alphaproteobacteria and Bacteroidetes. A high fragment recruitment of viral metagenomic reads confirmed that most of the SAG-associated viruses are abundant in the ocean. This study demonstrates that single-cell genomics, in conjunction with sequence-based computational tools, enable in situ, cultivation-independent insights into host–virus interactions in complex microbial communities.« less
Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

PubMed Central

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-01-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to

Demographic divergence history of pied flycatcher and collared flycatcher inferred from whole-genome re-sequencing data.

PubMed

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-11-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to
Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

PubMed Central

2012-01-01

Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13
The evolutionary history of termites as inferred from 66 mitochondrial genomes.

PubMed

Bourguignon, Thomas; Lo, Nathan; Cameron, Stephen L; Šobotník, Jan; Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Roisin, Yves; Miura, Toru; Evans, Theodore A

2015-02-01

Termites have colonized many habitats and are among the most abundant animals in tropical ecosystems, which they modify considerably through their actions. The timing of their rise in abundance and of the dispersal events that gave rise to modern termite lineages is not well understood. To shed light on termite origins and diversification, we sequenced the mitochondrial genome of 48 termite species and combined them with 18 previously sequenced termite mitochondrial genomes for phylogenetic and molecular clock analyses using multiple fossil calibrations. The 66 genomes represent most major clades of termites. Unlike previous phylogenetic studies based on fewer molecular data, our phylogenetic tree is fully resolved for the lower termites. The phylogenetic positions of Macrotermitinae and Apicotermitinae are also resolved as the basal groups in the higher termites, but in the crown termitid groups, including Termitinae + Syntermitinae + Nasutitermitinae + Cubitermitinae, the position of some nodes remains uncertain. Our molecular clock tree indicates that the lineages leading to termites and Cryptocercus roaches diverged 170 Ma (153-196 Ma 95% confidence interval [CI]), that modern Termitidae arose 54 Ma (46-66 Ma 95% CI), and that the crown termitid group arose 40 Ma (35-49 Ma 95% CI). This indicates that the distribution of basal termite clades was influenced by the final stages of the breakup of Pangaea. Our inference of ancestral geographic ranges shows that the Termitidae, which includes more than 75% of extant termite species, most likely originated in Africa or Asia, and acquired their pantropical distribution after a series of dispersal and subsequent diversification events. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Dynamics and estimates of growth and loss rates of bacterioplankton in a temperate freshwater system.

PubMed

Jugnia, Louis-B; Sime-Ngando, Télesphore; Gilbert, Daniel

2006-10-01

The growth rate and losses of bacterioplankton in the epilimnion of an oligo-mesotrophic reservoir were simultaneously estimated using three different methods for each process. Bacterial production was determined by means of the tritiated thymidine incorporation method, the dialysis bag method and the dilution method, while bacterial mortality was assessed with the dilution method, the disappearance of thymidine-labeled natural cells and ingestion of fluorescent bacterial tracers by heterotrophic flagellates. The different methods used to estimate bacterial growth rates yielded similar results. On the other hand, the mortality rates obtained with the dilution method were significantly lower than those obtained with the use of thymidine-labeled natural cells. The bacterial ingestion rate by flagellates accounted on average for 39% of total bacterial mortality estimated by the dilution method, but this value fell to 5% when the total mortality was measured by the thymidine-labeling method. Bacterial abundance and production varied in opposite phase to flagellate abundance and the various bacterial mortality rates. All this points to the critical importance of methodological aspects in the elaboration of quantitative models of matter and energy flows over the time through microbial trophic networks in aquatic systems, and highlights the role of bacterioplankton as a source of carbon for higher trophic levels in the studied system.
Network analysis reveals seasonal variation of co-occurrence correlations between Cyanobacteria and other bacterioplankton.

PubMed

Zhao, Dayong; Shen, Feng; Zeng, Jin; Huang, Rui; Yu, Zhongbo; Wu, Qinglong L

2016-12-15

Association network approaches have recently been proposed as a means for exploring the associations between bacterial communities. In the present study, high-throughput sequencing was employed to investigate the seasonal variations in the composition of bacterioplankton communities in six eutrophic urban lakes of Nanjing City, China. Over 150,000 16S rRNA sequences were derived from 52 water samples, and correlation-based network analyses were conducted. Our results demonstrated that the architecture of the co-occurrence networks varied in different seasons. Cyanobacteria played various roles in the ecological networks during different seasons. Co-occurrence patterns revealed that members of Cyanobacteria shared a very similar niche and they had weak positive correlations with other phyla in summer. To explore the effect of environmental factors on species-species co-occurrence networks and to determine the most influential environmental factors, the original positive network was simplified by module partitioning and by calculating module eigengenes. Module eigengene analysis indicated that temperature only affected some Cyanobacteria; the rest were mainly affected by nitrogen associated factors throughout the year. Cyanobacteria were dominant in summer which may result from strong co-occurrence patterns and suitable living conditions. Overall, this study has improved our understanding of the roles of Cyanobacteria and other bacterioplankton in ecological networks. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events.

PubMed

Wang, Xiyin; Wang, Jingpeng; Jin, Dianchuan; Guo, Hui; Lee, Tae-Ho; Liu, Tao; Paterson, Andrew H

2015-06-01

Multiple comparisons among genomes can clarify their evolution, speciation, and functional innovations. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ∼96 million years ago and could not be related to the Cretaceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if polyploidization directly contributed to speciation. This work lays a solid foundation for Poaceae translational genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Genome-Wide SNP Genotyping to Infer the Effects on Gene Functions in Tomato

PubMed Central

Hirakawa, Hideki; Shirasawa, Kenta; Ohyama, Akio; Fukuoka, Hiroyuki; Aoki, Koh; Rothan, Christophe; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi

2013-01-01

The genotype data of 7054 single nucleotide polymorphism (SNP) loci in 40 tomato lines, including inbred lines, F1 hybrids, and wild relatives, were collected using Illumina's Infinium and GoldenGate assay platforms, the latter of which was utilized in our previous study. The dendrogram based on the genotype data corresponded well to the breeding types of tomato and wild relatives. The SNPs were classified into six categories according to their positions in the genes predicted on the tomato genome sequence. The genes with SNPs were annotated by homology searches against the nucleotide and protein databases, as well as by domain searches, and they were classified into the functional categories defined by the NCBI's eukaryotic orthologous groups (KOG). To infer the SNPs' effects on the gene functions, the three-dimensional structures of the 843 proteins that were encoded by the genes with SNPs causing missense mutations were constructed by homology modelling, and 200 of these proteins were considered to carry non-synonymous amino acid substitutions in the predicted functional sites. The SNP information obtained in this study is available at the Kazusa Tomato Genomics Database (http://plant1.kazusa.or.jp/tomato/). PMID:23482505
Coastal bacterioplankton community diversity along a latitudinal gradient in Latin America by means of V6 tag pyrosequencing.

PubMed

Thompson, Fabiano L; Bruce, Thiago; Gonzalez, Alessandra; Cardoso, Alexander; Clementino, Maysa; Costagliola, Marcela; Hozbor, Constanza; Otero, Ernesto; Piccini, Claudia; Peressutti, Silvia; Schmieder, Robert; Edwards, Robert; Smith, Mathew; Takiyama, Luis Roberto; Vieira, Ricardo; Paranhos, Rodolfo; Artigas, Luis Felipe

2011-02-01

The bacterioplankton diversity of coastal waters along a latitudinal gradient between Puerto Rico and Argentina was analyzed using a total of 134,197 high-quality sequences from the V6 hypervariable region of the small-subunit ribosomal RNA gene (16S rRNA) (mean length of 60 nt). Most of the OTUs were identified into Proteobacteria, Bacteriodetes, Cyanobacteria, and Actinobacteria, corresponding to approx. 80% of the total number of sequences. The number of OTUs corresponding to species varied between 937 and 1946 in the seven locations. Proteobacteria appeared at high frequency in the seven locations. An enrichment of Cyanobacteria was observed in Puerto Rico, whereas an enrichment of Bacteroidetes was detected in the Argentinian shelf and Uruguayan coastal lagoons. The highest number of sequences of Actinobacteria and Acidobacteria were obtained in the Amazon estuary mouth. The rarefaction curves and Good coverage estimator for species diversity suggested a significant coverage, with values ranging between 92 and 97% for Good coverage. Conserved taxa corresponded to aprox. 52% of all sequences. This study suggests that human-contaminated environments may influence bacterioplankton diversity.
Algorithm of OMA for large-scale orthology inference

PubMed Central

Roth, Alexander CJ; Gonnet, Gaston H; Dessimoz, Christophe

2008-01-01

Background OMA is a project that aims to identify orthologs within publicly available, complete genomes. With 657 genomes analyzed to date, OMA is one of the largest projects of its kind. Results The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests. Conclusion OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments. PMID:19055798
Inferring evolutionary responses of Anolis carolinensis introduced into the Ogasawara archipelago using whole genome sequence data.

PubMed

Tamate, Satoshi; Iwasaki, Watal M; Krysko, Kenneth L; Camposano, Brian J; Mori, Hideaki; Funayama, Ryo; Nakayama, Keiko; Makino, Takashi; Kawata, Masakado

2017-12-21

Invaded species often can rapidly expand and establish in novel environments through adaptive evolution, resulting in devastating effects on native communities. However, it is unclear if genetic variation at whole-genomic levels is actually reduced in the introduced populations and which genetic changes have occurred responding to adaptation to new environments. In the 1960s, Anolis carolinensis was introduced onto one of the Ogasawara Islands, Japan, and subsequently expanded its range rapidly throughout two of the islands. Morphological comparison showed that lower hindlimb length in the introduced populations tended to be longer than those in its native Florida populations. Using re-sequenced whole genomic data, we estimated that the effective population size at the time of introduction was actually small (less than 50). We also inferred putative genomic regions subject to natural selection after this introduction event using SweeD and a method based on Tajima's D, π and F ST . Five candidate genes that were potentially subject to selection were estimated by both methods. The results suggest that there were standing variations that could potentially contribute to adaptation to nonnative environments despite the founder population being small.
The Recombination Landscape in Wild House Mice Inferred Using Population Genomic Data.

PubMed

Booker, Tom R; Ness, Rob W; Keightley, Peter D

2017-09-01

Characterizing variation in the rate of recombination across the genome is important for understanding several evolutionary processes. Previous analysis of the recombination landscape in laboratory mice has revealed that the different subspecies have different suites of recombination hotspots. It is unknown, however, whether hotspots identified in laboratory strains reflect the hotspot diversity of natural populations or whether broad-scale variation in the rate of recombination is conserved between subspecies. In this study, we constructed fine-scale recombination rate maps for a natural population of the Eastern house mouse, Mus musculus castaneus We performed simulations to assess the accuracy of recombination rate inference in the presence of phase errors, and we used a novel approach to quantify phase error. The spatial distribution of recombination events is strongly positively correlated between our castaneus map, and a map constructed using inbred lines derived predominantly from M. m. domesticus Recombination hotspots in wild castaneus show little overlap, however, with the locations of double-strand breaks in wild-derived house mouse strains. Finally, we also find that genetic diversity in M. m. castaneus is positively correlated with the rate of recombination, consistent with pervasive natural selection operating in the genome. Our study suggests that recombination rate variation is conserved at broad scales between house mouse subspecies, but it is not strongly conserved at fine scales. Copyright © 2017 by the Genetics Society of America.
Diel fluctuations in the abundance and community diversity of coastal bacterioplankton assemblages over a tidal cycle.

PubMed

Olapade, Ola A

2012-01-01

The diel change in abundance and community diversity of the bacterioplankton assemblages within the Pacific Ocean at a fixed location in Monterey Bay, California (USA) were examined with several culture-independent (i.e., nucleic acid staining, fluorescence in situ hybridization {FISH}, and 16S ribosomal RNA gene libraries) approaches over a tidal cycle. FISH analyses revealed the quantitative predominance of bacterial members belonging to the Cytophaga-Flavobacterium cluster as well as two Proteobacteria (α- and γ-) subclasses within the bacterioplankton assemblages, especially during high tide (HT) and outgoing tide (OT) than the other tidal events. While the clone libraries showed that majority of the sequences were similar to the 16S rRNA gene sequences of unknown bacteria (32% to 73%), however, the operational taxonomic units from members of the α-Proteobacteria, Bacteroidetes, Firmicutes, and Cyanobacteria were also well represented during the four tidal events examined. Comparatively, sequence diversity was highest in OT, lowest in low tide, and very similar between HT and incoming tide. The results indicate that the dynamics of bacterial occurrence and diversity appeared to be more pronounced during HT and OT, further indicative of the ecological importance of several environmental variables including temperature, light intensity, and nutrient availability that are also concurrently fluctuating during these tidal events in marine systems.
Freshwater bacterioplankton richness in oligotrophic lakes depends on nutrient availability rather than on species–area relationships

PubMed Central

Logue, Jürg Brendan; Langenheder, Silke; Andersson, Anders F; Bertilsson, Stefan; Drakare, Stina; Lanzén, Anders; Lindström, Eva S

2012-01-01

A central goal in ecology is to grasp the mechanisms that underlie and maintain biodiversity and patterns in its spatial distribution can provide clues about those mechanisms. Here, we investigated what might determine the bacterioplankton richness (BR) in lakes by means of 454 pyrosequencing of the 16S rRNA gene. We further provide a BR estimate based upon a sampling depth and accuracy, which, to our knowledge, are unsurpassed for freshwater bacterioplankton communities. Our examination of 22 669 sequences per lake showed that freshwater BR in fourteen nutrient-poor lakes was positively influenced by nutrient availability. Our study is, thus, consistent with the finding that the supply of available nutrients is a major driver of species richness; a pattern that may well be universally valid to the world of both micro- and macro-organisms. We, furthermore, observed that BR increased with elevated landscape position, most likely as a consequence of differences in nutrient availability. Finally, BR decreased with increasing lake and catchment area that is negative species–area relationships (SARs) were recorded; a finding that re-opens the debate about whether positive SARs can indeed be found in the microbial world and whether positive SARs can in fact be pronounced as one of the few ‘laws' in ecology. PMID:22170419
The Causal Meaning of Genomic Predictors and How It Affects Construction and Comparison of Genome-Enabled Selection Models

PubMed Central

Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.

2015-01-01

The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations

PubMed Central

Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

2016-01-01

MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD. PMID:26849207
Integration of Multiple Genomic and Phenotype Data to Infer Novel miRNA-Disease Associations.

PubMed

Shi, Hongbo; Zhang, Guangde; Zhou, Meng; Cheng, Liang; Yang, Haixiu; Wang, Jing; Sun, Jie; Wang, Zhenzhen

2016-01-01

MicroRNAs (miRNAs) play an important role in the development and progression of human diseases. The identification of disease-associated miRNAs will be helpful for understanding the molecular mechanisms of diseases at the post-transcriptional level. Based on different types of genomic data sources, computational methods for miRNA-disease association prediction have been proposed. However, individual source of genomic data tends to be incomplete and noisy; therefore, the integration of various types of genomic data for inferring reliable miRNA-disease associations is urgently needed. In this study, we present a computational framework, CHNmiRD, for identifying miRNA-disease associations by integrating multiple genomic and phenotype data, including protein-protein interaction data, gene ontology data, experimentally verified miRNA-target relationships, disease phenotype information and known miRNA-disease connections. The performance of CHNmiRD was evaluated by experimentally verified miRNA-disease associations, which achieved an area under the ROC curve (AUC) of 0.834 for 5-fold cross-validation. In particular, CHNmiRD displayed excellent performance for diseases without any known related miRNAs. The results of case studies for three human diseases (glioblastoma, myocardial infarction and type 1 diabetes) showed that all of the top 10 ranked miRNAs having no known associations with these three diseases in existing miRNA-disease databases were directly or indirectly confirmed by our latest literature mining. All these results demonstrated the reliability and efficiency of CHNmiRD, and it is anticipated that CHNmiRD will serve as a powerful bioinformatics method for mining novel disease-related miRNAs and providing a new perspective into molecular mechanisms underlying human diseases at the post-transcriptional level. CHNmiRD is freely available at http://www.bio-bigdata.com/CHNmiRD.
Inferring the Minimal Genome of Mesoplasma florum by Comparative Genomics and Transposon Mutagenesis.

PubMed

Baby, Vincent; Lachance, Jean-Christophe; Gagnon, Jules; Lucier, Jean-François; Matteau, Dominick; Knight, Tom; Rodrigue, Sébastien

2018-01-01

The creation and comparison of minimal genomes will help better define the most fundamental mechanisms supporting life. Mesoplasma florum is a near-minimal, fast-growing, nonpathogenic bacterium potentially amenable to genome reduction efforts. In a comparative genomic study of 13 M. florum strains, including 11 newly sequenced genomes, we have identified the core genome and open pangenome of this species. Our results show that all of the strains have approximately 80% of their gene content in common. Of the remaining 20%, 17% of the genes were found in multiple strains and 3% were unique to any given strain. On the basis of random transposon mutagenesis, we also estimated that ~290 out of 720 genes are essential for M. florum L1 in rich medium. We next evaluated different genome reduction scenarios for M. florum L1 by using gene conservation and essentiality data, as well as comparisons with the first working approximation of a minimal organism, Mycoplasma mycoides JCVI-syn3.0. Our results suggest that 409 of the 473 M. mycoides JCVI-syn3.0 genes have orthologs in M. florum L1. Conversely, 57 putatively essential M. florum L1 genes have no homolog in M. mycoides JCVI-syn3.0. This suggests differences in minimal genome compositions, even for these evolutionarily closely related bacteria. IMPORTANCE The last years have witnessed the development of whole-genome cloning and transplantation methods and the complete synthesis of entire chromosomes. Recently, the first minimal cell, Mycoplasma mycoides JCVI-syn3.0, was created. Despite these milestone achievements, several questions remain to be answered. For example, is the composition of minimal genomes virtually identical in phylogenetically related species? On the basis of comparative genomics and transposon mutagenesis, we investigated this question by using an alternative model, Mesoplasma florum, that is also amenable to genome reduction efforts. Our results suggest that the creation of additional minimal
Estimating Bacterioplankton Production by Measuring [3H]thymidine Incorporation in a Eutrophic Swedish Lake

PubMed Central

Bell, Russell T.; Ahlgren, Gunnel M.; Ahlgren, Ingemar

1983-01-01

Bacterioplankton abundance, [3H]thymidine incorporation, 14CO2 uptake in the dark, and fractionated primary production were measured on several occasions between June and August 1982 in eutrophic Lake Norrviken, Sweden. Bacterioplankton abundance and carbon biomass ranged from 0.5 × 109 to 2.4 × 109 cells liter−1 and 7 to 47 μg of C liter−1, respectively. The average bacterial cell volume was 0.185 μm3. [3H]thymidine incorporation into cold-trichloroacetic acid-insoluble material ranged from 12 × 10−12 to 200 × 10−12 mol liter−1 h−1. Bacterial carbon production rates were estimated to be 0.2 to 7.1 μg of C liter−1 h−1. Bacterial production estimates from [3H]thymidine incorporation and 14CO2 uptake in the dark agreed when activity was high but diverged when activity was low and when blue-green algae (cyanobacteria) dominated the phytoplankton. Size fractionation indicated negligible uptake of [3H]thymidine in the >3-μm fraction during a chrysophycean bloom in early June. We found that >50% of the 3H activity was in the >3-μm fraction in late August; this phenomenon was most likely due to Microcystis spp., their associated bacteria, or both. Over 60% of the 14CO2 uptake in the dark was attributed to algae on each sampling occasion. Algal exudate was an important carbon source for planktonic bacteria. Bacterial production was roughly 50% of primary production. PMID:16346304
Flavobacteria Blooms in Four Eutrophic Lakes: Linking Population Dynamics of Freshwater Bacterioplankton to Resource Availability▿ †

PubMed Central

Eiler, Alexander; Bertilsson, Stefan

2007-01-01

Heterotrophic bacteria are major contributors to biogeochemical cycles and influence water quality. Still, the lack of representative isolates and the few quantitative surveys leave the ecological role and significance of single bacterial populations to be revealed. Here we analyzed the diversity and dynamics of freshwater Flavobacteria populations in four eutrophic temperate lakes. From each lake, clone libraries were constructed using primers specific for either the class Flavobacteria or Bacteria. Sequencing of 194 Flavobacteria clones from 8 libraries revealed a diverse freshwater Flavobacteria community and distinct differences among lakes. Abundance and seasonal dynamics of Flavobacteria were assessed by quantitative PCR with class-specific primers. In parallel, the dynamics of individual populations within the Flavobacteria community were assessed with terminal restriction fragment length polymorphism analysis using identical primers. The contribution of Flavobacteria to the total bacterioplankton community ranged from 0.4 to almost 100% (average, 24%). Blooms where Flavobacteria represented more than 30% of the bacterioplankton were observed at different times in the four lakes. In general, high proportions of Flavobacteria appeared during episodes of high bacterial production. Phylogenetic analyses combined with Flavobacteria community fingerprints suggested dominance of two Flavobacteria lineages. Both drastic alterations in total Flavobacteria and in community composition of this class significantly correlated with bacterial production, emphasizing that resource availability is an important driver of heterotrophic bacterial succession in eutrophic lakes. PMID:17435002
Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families

PubMed Central

Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert

2016-01-01

In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10–22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes. PMID:27402363

Improved orthologous databases to ease protozoan targets inference.

PubMed

Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R

2015-09-29

Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13
Inferring patterns of folktale diffusion using genomic data

PubMed Central

Bortolini, Eugenio; Pagani, Luca; Sarno, Stefania; Boattini, Alessio; Sazzini, Marco; da Silva, Sara Graça; Martini, Gessica; Metspalu, Mait; Pettener, Davide; Luiselli, Donata; Tehrani, Jamshid J.

2017-01-01

Observable patterns of cultural variation are consistently intertwined with demic movements, cultural diffusion, and adaptation to different ecological contexts [Cavalli-Sforza and Feldman (1981) Cultural Transmission and Evolution: A Quantitative Approach; Boyd and Richerson (1985) Culture and the Evolutionary Process]. The quantitative study of gene–culture coevolution has focused in particular on the mechanisms responsible for change in frequency and attributes of cultural traits, the spread of cultural information through demic and cultural diffusion, and detecting relationships between genetic and cultural lineages. Here, we make use of worldwide whole-genome sequences [Pagani et al. (2016) Nature 538:238–242] to assess the impact of processes involving population movement and replacement on cultural diversity, focusing on the variability observed in folktale traditions (n = 596) [Uther (2004) The Types of International Folktales: A Classification and Bibliography. Based on the System of Antti Aarne and Stith Thompson] in Eurasia. We find that a model of cultural diffusion predicted by isolation-by-distance alone is not sufficient to explain the observed patterns, especially at small spatial scales (up to ∼4,000 km). We also provide an empirical approach to infer presence and impact of ethnolinguistic barriers preventing the unbiased transmission of both genetic and cultural information. After correcting for the effect of ethnolinguistic boundaries, we show that, of the alternative models that we propose, the one entailing cultural diffusion biased by linguistic differences is the most plausible. Additionally, we identify 15 tales that are more likely to be predominantly transmitted through population movement and replacement and locate putative focal areas for a set of tales that are spread worldwide. PMID:28784786
Inferring patterns of folktale diffusion using genomic data.

PubMed

Bortolini, Eugenio; Pagani, Luca; Crema, Enrico R; Sarno, Stefania; Barbieri, Chiara; Boattini, Alessio; Sazzini, Marco; da Silva, Sara Graça; Martini, Gessica; Metspalu, Mait; Pettener, Davide; Luiselli, Donata; Tehrani, Jamshid J

2017-08-22

Observable patterns of cultural variation are consistently intertwined with demic movements, cultural diffusion, and adaptation to different ecological contexts [Cavalli-Sforza and Feldman (1981) Cultural Transmission and Evolution: A Quantitative Approach ; Boyd and Richerson (1985) Culture and the Evolutionary Process ]. The quantitative study of gene-culture coevolution has focused in particular on the mechanisms responsible for change in frequency and attributes of cultural traits, the spread of cultural information through demic and cultural diffusion, and detecting relationships between genetic and cultural lineages. Here, we make use of worldwide whole-genome sequences [Pagani et al. (2016) Nature 538:238-242] to assess the impact of processes involving population movement and replacement on cultural diversity, focusing on the variability observed in folktale traditions ( n = 596) [Uther (2004) The Types of International Folktales: A Classification and Bibliography. Based on the System of Antti Aarne and Stith Thompson ] in Eurasia. We find that a model of cultural diffusion predicted by isolation-by-distance alone is not sufficient to explain the observed patterns, especially at small spatial scales (up to [Formula: see text]4,000 km). We also provide an empirical approach to infer presence and impact of ethnolinguistic barriers preventing the unbiased transmission of both genetic and cultural information. After correcting for the effect of ethnolinguistic boundaries, we show that, of the alternative models that we propose, the one entailing cultural diffusion biased by linguistic differences is the most plausible. Additionally, we identify 15 tales that are more likely to be predominantly transmitted through population movement and replacement and locate putative focal areas for a set of tales that are spread worldwide.
Inferring species divergence times using pairwise sequential Markovian coalescent modelling and low-coverage genomic data.

PubMed

Cahill, James A; Soares, André E R; Green, Richard E; Shapiro, Beth

2016-07-19

Understanding when species diverged aids in identifying the drivers of speciation, but the end of gene flow between populations can be difficult to ascertain from genetic data. We explore the use of pairwise sequential Markovian coalescent (PSMC) modelling to infer the timing of divergence between species and populations. PSMC plots generated using artificial hybrid genomes show rapid increases in effective population size at the time when the two parent lineages diverge, and this approach has been used previously to infer divergence between human lineages. We show that, even without high coverage or phased input data, PSMC can detect the end of significant gene flow between populations by comparing the PSMC output from artificial hybrids to the output of simulations with known demographic histories. We then apply PSMC to detect divergence times among lineages within two real datasets: great apes and bears within the genus Ursus Our results confirm most previously proposed divergence times for these lineages, and suggest that gene flow between recently diverged lineages may have been common among bears and great apes, including up to one million years of continued gene flow between chimpanzees and bonobos after the formation of the Congo River.This article is part of the themed issue 'Dating species divergences using rocks and clocks'. © 2016 The Author(s).
Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties

PubMed Central

Wang, Yongcui; Fang, Jianwen; Chen, Shilong

2016-01-01

Accurately predicting the response of a cancer patient to a therapeutic agent is a core goal of precision medicine. Existing approaches were mainly relied primarily on genomic alterations in cancer cells that have been treated with different drugs. Here we focus on predicting drug response based on integration of the heterogeneously pharmacogenomics data from both cell and drug sides. Through a systematical approach, named as PDRCC (Predict Drug Response in Cancer Cells), the cancer genomic alterations and compound chemical and therapeutic properties were incorporated to determine the chemotherapeutic response in cancer patients. Using the Cancer Cell Line Encyclopedia (CCLE) study as the benchmark dataset, all pharmacogenomics data exhibited their roles in inferring the relationships between cancer cells and drugs. When integrating both genomic resources and compound information, the prediction coverage was significantly increased. The validity of PDRCC was also supported by its effective in uncovering the unknown cell-drug associations with database and literature evidences. It set the stage for clinical testing of novel therapeutic strategies, such as the sensitive association between cancer cell ‘A549_LUNG’ and compound ‘Topotecan’. In conclusion, PDRCC offers the possibility for faster, safer, and cheaper the development of novel anti-cancer therapeutics in the early-stage clinical trails. PMID:27645580
Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties

NASA Astrophysics Data System (ADS)

Wang, Yongcui; Fang, Jianwen; Chen, Shilong

2016-09-01

Accurately predicting the response of a cancer patient to a therapeutic agent is a core goal of precision medicine. Existing approaches were mainly relied primarily on genomic alterations in cancer cells that have been treated with different drugs. Here we focus on predicting drug response based on integration of the heterogeneously pharmacogenomics data from both cell and drug sides. Through a systematical approach, named as PDRCC (Predict Drug Response in Cancer Cells), the cancer genomic alterations and compound chemical and therapeutic properties were incorporated to determine the chemotherapeutic response in cancer patients. Using the Cancer Cell Line Encyclopedia (CCLE) study as the benchmark dataset, all pharmacogenomics data exhibited their roles in inferring the relationships between cancer cells and drugs. When integrating both genomic resources and compound information, the prediction coverage was significantly increased. The validity of PDRCC was also supported by its effective in uncovering the unknown cell-drug associations with database and literature evidences. It set the stage for clinical testing of novel therapeutic strategies, such as the sensitive association between cancer cell ‘A549_LUNG’ and compound ‘Topotecan’. In conclusion, PDRCC offers the possibility for faster, safer, and cheaper the development of novel anti-cancer therapeutics in the early-stage clinical trails.
Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties.

PubMed

Wang, Yongcui; Fang, Jianwen; Chen, Shilong

2016-09-20

Accurately predicting the response of a cancer patient to a therapeutic agent is a core goal of precision medicine. Existing approaches were mainly relied primarily on genomic alterations in cancer cells that have been treated with different drugs. Here we focus on predicting drug response based on integration of the heterogeneously pharmacogenomics data from both cell and drug sides. Through a systematical approach, named as PDRCC (Predict Drug Response in Cancer Cells), the cancer genomic alterations and compound chemical and therapeutic properties were incorporated to determine the chemotherapeutic response in cancer patients. Using the Cancer Cell Line Encyclopedia (CCLE) study as the benchmark dataset, all pharmacogenomics data exhibited their roles in inferring the relationships between cancer cells and drugs. When integrating both genomic resources and compound information, the prediction coverage was significantly increased. The validity of PDRCC was also supported by its effective in uncovering the unknown cell-drug associations with database and literature evidences. It set the stage for clinical testing of novel therapeutic strategies, such as the sensitive association between cancer cell 'A549_LUNG' and compound 'Topotecan'. In conclusion, PDRCC offers the possibility for faster, safer, and cheaper the development of novel anti-cancer therapeutics in the early-stage clinical trails.
N2 Fixation by Unicellular Bacterioplankton from the Atlantic and Pacific Oceans: Phylogeny and In Situ Rates

PubMed Central

Falcón, Luisa I.; Carpenter, Edward J.; Cipriano, Frank; Bergman, Birgitta; Capone, Douglas G.

2004-01-01

N2-fixing proteobacteria (α and γ) and unicellular cyanobacteria are common in both the tropical North Atlantic and Pacific oceans. In near-surface waters proteobacterial nifH transcripts were present during both night and day while unicellular cyanobacterial nifH transcripts were present during the nighttime only, suggesting separation of N2 fixation and photosynthesis by unicellular cyanobacteria. Phylogenetic relationships among unicellular cyanobacteria from both oceans were determined after sequencing of a conserved region of 16S ribosomal DNA (rDNA) of cyanobacteria, and results showed that they clustered together, regardless of the ocean of origin. However, sequencing of nifH transcripts of unicellular cyanobacteria from both oceans showed that they clustered separately. This suggests that unicellular cyanobacteria from the tropical North Atlantic and subtropical North Pacific share a common ancestry (16S rDNA) and that potential unicellular N2 fixers have diverged (nifH). N2 fixation rates for unicellular bacterioplankton (including small cyanobacteria) from both oceans were determined in situ according to the acetylene reduction and 15N2 protocols. The results showed that rates of fixation by bacterioplankton can be almost as high as those of fixation by the colonial N2-fixing marine cyanobacteria Trichodesmium spp. in the tropical North Atlantic but that rates are much lower in the subtropical North Pacific. PMID:14766553
Flow cytometric monitoring of bacterioplankton phenotypic diversity predicts high population-specific feeding rates by invasive dreissenid mussels.

PubMed

Props, Ruben; Schmidt, Marian L; Heyse, Jasmine; Vanderploeg, Henry A; Boon, Nico; Denef, Vincent J

2018-02-01

Species invasion is an important disturbance to ecosystems worldwide, yet knowledge about the impacts of invasive species on bacterial communities remains sparse. Using a novel approach, we simultaneously detected phenotypic and derived taxonomic change in a natural bacterioplankton community when subjected to feeding pressure by quagga mussels, a widespread aquatic invasive species. We detected a significant decrease in diversity within 1 h of feeding and a total diversity loss of 11.6 ± 4.1% after 3 h. This loss of microbial diversity was caused by the selective removal of high nucleic acid populations (29 ± 5% after 3 h). We were able to track the community diversity at high temporal resolution by calculating phenotypic diversity estimates from flow cytometry (FCM) data of minute amounts of sample. Through parallel FCM and 16S rRNA gene amplicon sequencing analysis of environments spanning a broad diversity range, we showed that the two approaches resulted in highly correlated diversity measures and captured the same seasonal and lake-specific patterns in community composition. Based on our results, we predict that selective feeding by invasive dreissenid mussels directly impacts the microbial component of the carbon cycle, as it may drive bacterioplankton communities toward less diverse and potentially less productive states. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
Community assembly processes underlying phytoplankton and bacterioplankton across a hydrologic change in a human-impacted river.

PubMed

Isabwe, Alain; Yang, Jun R; Wang, Yongming; Liu, Lemian; Chen, Huihuang; Yang, Jun

2018-07-15

Although the influence of microbial community assembly processes on aquatic ecosystem function and biodiversity is well known, the processes that govern planktonic communities in human-impacted rivers remain largely unstudied. Here, we used multivariate statistics and a null model approach to test the hypothesis that environmental conditions and obstructed dispersal opportunities, dictate a deterministic community assembly for phytoplankton and bacterioplankton across contrasting hydrographic conditions in a subtropical mid-sized river (Jiulong River, southeast China). Variation partitioning analysis showed that the explanatory power of local environmental variables was larger than that of the spatial variables for both plankton communities during the dry season. During the wet season, phytoplankton community variation was mainly explained by local environmental variables, whereas the variance in bacterioplankton was explained by both environmental and spatial predictors. The null model based on Raup-Crick coefficients for both planktonic groups suggested little evidences of the stochastic processes involving dispersal and random distribution. Our results showed that hydrological change and landscape structure act together to cause divergence in communities along the river channel, thereby dictating a deterministic assembly and that selection exceeds dispersal limitation during the dry season. Therefore, to protect the ecological integrity of human-impacted rivers, watershed managers should not only consider local environmental conditions but also dispersal routes to account for the effect of regional species pool on local communities. Copyright © 2018 Elsevier B.V. All rights reserved.
Phylotype Dynamics of Bacterial P Utilization Genes in Microbialites and Bacterioplankton of a Monomictic Endorheic Lake.

PubMed

Valdespino-Castillo, Patricia M; Alcántara-Hernández, Rocío J; Merino-Ibarra, Martín; Alcocer, Javier; Macek, Miroslav; Moreno-Guillén, Octavio A; Falcón, Luisa I

2017-02-01

Microbes can modulate ecosystem function since they harbor a vast genetic potential for biogeochemical cycling. The spatial and temporal dynamics of this genetic diversity should be acknowledged to establish a link between ecosystem function and community structure. In this study, we analyzed the genetic diversity of bacterial phosphorus utilization genes in two microbial assemblages, microbialites and bacterioplankton of Lake Alchichica, a semiclosed (i.e., endorheic) system with marked seasonality that varies in nutrient conditions, temperature, dissolved oxygen, and water column stability. We focused on dissolved organic phosphorus (DOP) utilization gene dynamics during contrasting mixing and stratification periods. Bacterial alkaline phosphatases (phoX and phoD) and alkaline beta-propeller phytases (bpp) were surveyed. DOP utilization genes showed different dynamics evidenced by a marked change within an intra-annual period and a differential circadian pattern of expression. Although Lake Alchichica is a semiclosed system, this dynamic turnover of phylotypes (from lake circulation to stratification) points to a different potential of DOP utilization by the microbial communities within periods. DOP utilization gene dynamics was different among genetic markers and among assemblages (microbialite vs. bacterioplankton). As estimated by the system's P mass balance, P inputs and outputs were similar in magnitude (difference was <10 %). A theoretical estimation of water column P monoesters was used to calculate the potential P fraction that can be remineralized on an annual basis. Overall, bacterial groups including Proteobacteria (Alpha and Gamma) and Bacteroidetes seem to be key participants in DOP utilization responses.
A core phylogeny of Dictyostelia inferred from genomes representative of the eight major and minor taxonomic divisions of the group.

PubMed

Singh, Reema; Schilde, Christina; Schaap, Pauline

2016-11-17

Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum
Ecophysiology of Freshwater Verrucomicrobia Inferred from Metagenome-Assembled Genomes

PubMed Central

He, Shaomei; Stevens, Sarah L. R.; Chan, Leong-Keat; Bertilsson, Stefan; Glavina del Rio, Tijana; Tringe, Susannah G.; Malmstrom, Rex R.

2017-01-01

ABSTRACT Microbes are critical in carbon and nutrient cycling in freshwater ecosystems. Members of the Verrucomicrobia are ubiquitous in such systems, and yet their roles and ecophysiology are not well understood. In this study, we recovered 19 Verrucomicrobia draft genomes by sequencing 184 time-series metagenomes from a eutrophic lake and a humic bog that differ in carbon source and nutrient availabilities. These genomes span four of the seven previously defined Verrucomicrobia subdivisions and greatly expand knowledge of the genomic diversity of freshwater Verrucomicrobia. Genome analysis revealed their potential role as (poly)saccharide degraders in freshwater, uncovered interesting genomic features for this lifestyle, and suggested their adaptation to nutrient availabilities in their environments. Verrucomicrobia populations differ significantly between the two lakes in glycoside hydrolase gene abundance and functional profiles, reflecting the autochthonous and terrestrially derived allochthonous carbon sources of the two ecosystems, respectively. Interestingly, a number of genomes recovered from the bog contained gene clusters that potentially encode a novel porin-multiheme cytochrome c complex and might be involved in extracellular electron transfer in the anoxic humus-rich environment. Notably, most epilimnion genomes have large numbers of so-called “Planctomycete-specific” cytochrome c-encoding genes, which exhibited distribution patterns nearly opposite to those seen with glycoside hydrolase genes, probably associated with the different levels of environmental oxygen availability and carbohydrate complexity between lakes/layers. Overall, the recovered genomes represent a major step toward understanding the role, ecophysiology, and distribution of Verrucomicrobia in freshwater. IMPORTANCE Freshwater Verrucomicrobia spp. are cosmopolitan in lakes and rivers, and yet their roles and ecophysiology are not well understood, as cultured freshwater
Evolutionary Inference across Eukaryotes Identifies Specific Pressures Favoring Mitochondrial Gene Retention.

PubMed

Johnston, Iain G; Williams, Ben P

2016-02-24

Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

PubMed Central

Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

2016-01-01

Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

PubMed

Thomas, Paul D

2010-06-09

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blaby, Ian K.; Blaby-Haas, Crysten E.

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
Genomics and functional genomics in Chlamydomonas reinhardtii

DOE PAGES

Blaby, Ian K.; Blaby-Haas, Crysten E.

2017-03-21

The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less
The Passive Yet Successful Way of Planktonic Life: Genomic and Experimental Analysis of the Ecology of a Free-Living Polynucleobacter Population

PubMed Central

Hahn, Martin W.; Scheuerl, Thomas; Jezberová, Jitka; Koll, Ulrike; Jezbera, Jan; Šimek, Karel; Vannini, Claudia; Petroni, Giulio; Wu, Qinglong L.

2012-01-01

Background The bacterial taxon Polynucleobacter necessarius subspecies asymbioticus represents a group of planktonic freshwater bacteria with cosmopolitan and ubiquitous distribution in standing freshwater habitats. These bacteria comprise <1% to 70% (on average about 20%) of total bacterioplankton cells in various freshwater habitats. The ubiquity of this taxon was recently explained by intra-taxon ecological diversification, i.e. specialization of lineages to specific environmental conditions; however, details on specific adaptations are not known. Here we investigated by means of genomic and experimental analyses the ecological adaptation of a persistent population dwelling in a small acidic pond. Findings The investigated population (F10 lineage) contributed on average 11% to total bacterioplankton in the pond during the vegetation periods (ice-free period, usually May to November). Only a low degree of genetic diversification of the population could be revealed. These bacteria are characterized by a small genome size (2.1 Mb), a relatively small number of genes involved in transduction of environmental signals, and the lack of motility and quorum sensing. Experiments indicated that these bacteria live as chemoorganotrophs by mainly utilizing low-molecular-weight substrates derived from photooxidation of humic substances. Conclusions Evolutionary genome streamlining resulted in a highly passive lifestyle so far only known among free-living bacteria from pelagic marine taxa dwelling in environmentally stable nutrient-poor off-shore systems. Surprisingly, such a lifestyle is also successful in a highly dynamic and nutrient-richer environment such as the water column of the investigated pond, which was undergoing complete mixis and pronounced stratification in diurnal cycles. Obviously, metabolic and ecological versatility is not a prerequisite for long-lasting establishment of abundant bacterial populations under highly dynamic environmental conditions. Caution
The passive yet successful way of planktonic life: genomic and experimental analysis of the ecology of a free-living polynucleobacter population.

PubMed

Hahn, Martin W; Scheuerl, Thomas; Jezberová, Jitka; Koll, Ulrike; Jezbera, Jan; Šimek, Karel; Vannini, Claudia; Petroni, Giulio; Wu, Qinglong L

2012-01-01

The bacterial taxon Polynucleobacter necessarius subspecies asymbioticus represents a group of planktonic freshwater bacteria with cosmopolitan and ubiquitous distribution in standing freshwater habitats. These bacteria comprise <1% to 70% (on average about 20%) of total bacterioplankton cells in various freshwater habitats. The ubiquity of this taxon was recently explained by intra-taxon ecological diversification, i.e. specialization of lineages to specific environmental conditions; however, details on specific adaptations are not known. Here we investigated by means of genomic and experimental analyses the ecological adaptation of a persistent population dwelling in a small acidic pond. The investigated population (F10 lineage) contributed on average 11% to total bacterioplankton in the pond during the vegetation periods (ice-free period, usually May to November). Only a low degree of genetic diversification of the population could be revealed. These bacteria are characterized by a small genome size (2.1 Mb), a relatively small number of genes involved in transduction of environmental signals, and the lack of motility and quorum sensing. Experiments indicated that these bacteria live as chemoorganotrophs by mainly utilizing low-molecular-weight substrates derived from photooxidation of humic substances. Evolutionary genome streamlining resulted in a highly passive lifestyle so far only known among free-living bacteria from pelagic marine taxa dwelling in environmentally stable nutrient-poor off-shore systems. Surprisingly, such a lifestyle is also successful in a highly dynamic and nutrient-richer environment such as the water column of the investigated pond, which was undergoing complete mixis and pronounced stratification in diurnal cycles. Obviously, metabolic and ecological versatility is not a prerequisite for long-lasting establishment of abundant bacterial populations under highly dynamic environmental conditions. Caution should be exercised when

Algorithmic methods to infer the evolutionary trajectories in cancer progression

PubMed Central

Graudenzi, Alex; Ramazzotti, Daniele; Sanz-Pamplona, Rebeca; De Sano, Luca; Mauri, Giancarlo; Moreno, Victor; Antoniotti, Marco; Mishra, Bud

2016-01-01

The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses. PMID:27357673
The phylogenomic position of the grey nurse shark Carcharias taurus Rafinesque, 1810 (Lamniformes, Odontaspididae) inferred from the mitochondrial genome.

PubMed

Bowden, Deborah L; Vargas-Caro, Carolina; Ovenden, Jennifer R; Bennett, Michael B; Bustamante, Carlos

2016-11-01

The complete mitochondrial genome of the grey nurse shark Carcharias taurus is described from 25 963 828 sequences obtained using Illumina NGS technology. Total length of the mitogenome is 16 715 bp, consisting of 2 rRNAs, 13 protein-coding regions, 22 tRNA and 2 non-coding regions thus updating the previously published mitogenome for this species. The phylogenomic reconstruction inferred from the mitogenome of 15 species of Lamniform and Carcharhiniform sharks supports the inclusion of C. taurus in a clade with the Lamnidae and Cetorhinidae. This complete mitogenome contributes to ongoing investigation into the monophyly of the Family Odontaspididae.
Bacterioplankton communities of Crater Lake, OR: Dynamic changes with euphotic zone food web structure and stable deep water populations

USGS Publications Warehouse

Urbach, E.; Vergin, K.L.; Larson, G.L.; Giovannoni, S.J.

2007-01-01

The distribution of bacterial and archaeal species in Crater Lake plankton varies dramatically over depth and with time, as assessed by hybridization of group-specific oligonucleotides to RNA extracted from lakewater. Nonmetric, multidimensional scaling (MDS) analysis of relative bacterial phylotype densities revealed complex relationships among assemblages sampled from depth profiles in July, August and September of 1997 through 1999. CL500-11 green nonsulfur bacteria (Phylum Chloroflexi) and marine Group I crenarchaeota are consistently dominant groups in the oxygenated deep waters at 300 and 500 m. Other phylotypes found in the deep waters are similar to surface and mid-depth populations and vary with time. Euphotic zone assemblages are dominated either by ??-proteobacteria or CL120-10 verrucomicrobia, and ACK4 actinomycetes. MDS analyses of euphotic zone populations in relation to environmental variables and phytoplankton and zooplankton population structures reveal apparent links between Daphnia pulicaria zooplankton population densities and microbial community structure. These patterns may reflect food web interactions that link kokanee salmon population densities to community structure of the bacterioplankton, via fish predation on Daphnia with cascading consequences to Daphnia bacterivory and predation on bacterivorous protists. These results demonstrate a stable bottom-water microbial community. They also extend previous observations of food web-driven changes in euphotic zone bacterioplankton community structure to an oligotrophic setting. ?? 2007 Springer Science+Business Media B.V.
Quantification of Carbon and Phosphorus Co-Limitation in Bacterioplankton: New Insights on an Old Topic

PubMed Central

Dorado-García, Irene; Medina-Sánchez, Juan Manuel; Herrera, Guillermo; Cabrerizo, Marco J.; Carrillo, Presentación

2014-01-01

Because the nature of the main resource that limits bacterioplankton (e.g. organic carbon [C] or phosphorus [P]) has biogeochemical implications concerning organic C accumulation in freshwater ecosystems, empirical knowledge is needed concerning how bacteria respond to these two resources, available alone or together. We performed field experiments of resource manipulation (2×2 factorial design, with the addition of C, P, or both combined) in two Mediterranean freshwater ecosystems with contrasting trophic states (oligotrophy vs. eutrophy) and trophic natures (autotrophy vs. heterotrophy, measured as gross primary production:respiration ratio). Overall, the two resources synergistically co-limited bacterioplankton, i.e. the magnitude of the response of bacterial production and abundance to the two resources combined was higher than the additive response in both ecosystems. However, bacteria also responded positively to single P and C additions in the eutrophic ecosystem, but not to single C in the oligotrophic one, consistent with the value of the ratio between bacterial C demand and algal C supply. Accordingly, the trophic nature rather than the trophic state of the ecosystems proves to be a key feature determining the expected types of resource co-limitation of bacteria, as summarized in a proposed theoretical framework. The actual types of co-limitation shifted over time and partially deviated (a lesser degree of synergism) from the theoretical expectations, particularly in the eutrophic ecosystem. These deviations may be explained by extrinsic ecological forces to physiological limitations of bacteria, such as predation, whose role in our experiments is supported by the relationship between the dynamics of bacteria and bacterivores tested by SEMs (structural equation models). Our study, in line with the increasingly recognized role of freshwater ecosystems in the global C cycle, suggests that further attention should be focussed on the biotic interactions that
Drivers of coastal bacterioplankton community diversity and structure along a nutrient gradient in the East China Sea

NASA Astrophysics Data System (ADS)

He, Jiaying; Wang, Kai; Xiong, Jinbo; Guo, Annan; Zhang, Demin; Fei, Yuejun; Ye, Xiansen

2017-04-01

Anthropogenic nutrient discharge poses widespread threats to coastal ecosystems and has increased environmental gradients from coast to sea. Bacterioplankton play crucial roles in coastal biogeochemical cycling, and a variety of factors affect bacterial community diversity and structure. We used 16S rRNA gene pyrosequencing to investigate the spatial variation in bacterial community composition (BCC) across five sites on a coast-offshore gradient in the East China Sea. Overall, bacterial alpha-diversity did not differ across sites, except that richness and phylogenetic diversity were lower in the offshore sites, and the highest alpha-diversity was found in the most landward site, with Chl-a being the main factor. BCCs generally clustered into coastal and offshore groups. Chl-a explained 12.3% of the variation in BCCs, more than that explained by either the physicochemical (5.7%) or spatial (8.5%) variables. Nutrients (particularly nitrate and phosphate), along with phytoplankton abundance, were more important than other physicochemical factors, co-explaining 20.0% of the variation in BCCs. Additionally, a series of discriminant families (primarily affiliated with Gammaproteobacteria and Alphaproteobacteria), whose relative abundances correlated with Chl-a, DIN, and phosphate concentrations, were identified, implying their potential to indicate phytoplankton blooms and nutrient enrichment in this marine ecosystem. This study provides insight into bacterioplankton response patterns along a coast-offshore gradient, with phytoplankton abundance increasing in the offshore sites. Time-series sampling across multiple transects should be performed to determine the seasonal and spatial patterns in bacterial diversity and community structure along this gradient.
Drivers of coastal bacterioplankton community diversity and structure along a nutrient gradient in the East China Sea

NASA Astrophysics Data System (ADS)

He, Jiaying; Wang, Kai; Xiong, Jinbo; Guo, Annan; Zhang, Demin; Fei, Yuejun; Ye, Xiansen

2018-03-01

Anthropogenic nutrient discharge poses widespread threats to coastal ecosystems and has increased environmental gradients from coast to sea. Bacterioplankton play crucial roles in coastal biogeochemical cycling, and a variety of factors affect bacterial community diversity and structure. We used 16S rRNA gene pyrosequencing to investigate the spatial variation in bacterial community composition (BCC) across five sites on a coast-offshore gradient in the East China Sea. Overall, bacterial alpha-diversity did not differ across sites, except that richness and phylogenetic diversity were lower in the offshore sites, and the highest alpha-diversity was found in the most landward site, with Chl-a being the main factor. BCCs generally clustered into coastal and offshore groups. Chl-a explained 12.3% of the variation in BCCs, more than that explained by either the physicochemical (5.7%) or spatial (8.5%) variables. Nutrients (particularly nitrate and phosphate), along with phytoplankton abundance, were more important than other physicochemical factors, co-explaining 20.0% of the variation in BCCs. Additionally, a series of discriminant families (primarily affiliated with Gammaproteobacteria and Alphaproteobacteria), whose relative abundances correlated with Chl-a, DIN, and phosphate concentrations, were identified, implying their potential to indicate phytoplankton blooms and nutrient enrichment in this marine ecosystem. This study provides insight into bacterioplankton response patterns along a coast-offshore gradient, with phytoplankton abundance increasing in the offshore sites. Time-series sampling across multiple transects should be performed to determine the seasonal and spatial patterns in bacterial diversity and community structure along this gradient.
Gene-network inference by message passing

NASA Astrophysics Data System (ADS)

Braunstein, A.; Pagnani, A.; Weigt, M.; Zecchina, R.

2008-01-01

The inference of gene-regulatory processes from gene-expression data belongs to the major challenges of computational systems biology. Here we address the problem from a statistical-physics perspective and develop a message-passing algorithm which is able to infer sparse, directed and combinatorial regulatory mechanisms. Using the replica technique, the algorithmic performance can be characterized analytically for artificially generated data. The algorithm is applied to genome-wide expression data of baker's yeast under various environmental conditions. We find clear cases of combinatorial control, and enrichment in common functional annotations of regulated genes and their regulators.
Phytozome Comparative Plant Genomics Portal

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goodstein, David; Batra, Sajeev; Carlson, Joseph

2014-09-09

The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes
Viral quasispecies inference from 454 pyrosequencing

PubMed Central

2013-01-01

Background Many potentially life-threatening infectious viruses are highly mutable in nature. Characterizing the fittest variants within a quasispecies from infected patients is expected to allow unprecedented opportunities to investigate the relationship between quasispecies diversity and disease epidemiology. The advent of next-generation sequencing technologies has allowed the study of virus diversity with high-throughput sequencing, although these methods come with higher rates of errors which can artificially increase diversity. Results Here we introduce a novel computational approach that incorporates base quality scores from next-generation sequencers for reconstructing viral genome sequences that simultaneously infers the number of variants within a quasispecies that are present. Comparisons on simulated and clinical data on dengue virus suggest that the novel approach provides a more accurate inference of the underlying number of variants within the quasispecies, which is vital for clinical efforts in mapping the within-host viral diversity. Sequence alignments generated by our approach are also found to exhibit lower rates of error. Conclusions The ability to infer the viral quasispecies colony that is present within a human host provides the potential for a more accurate classification of the viral phenotype. Understanding the genomics of viruses will be relevant not just to studying how to control or even eradicate these viral infectious diseases, but also in learning about the innate protection in the human host against the viruses. PMID:24308284
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

PubMed Central

2010-01-01

Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in
Genome-wide inference of regulatory networks in Streptomyces coelicolor.

PubMed

Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou

2010-10-18

The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.
Thinking too positive? Revisiting current methods of population genetic selection inference.

PubMed

Bank, Claudia; Ewing, Gregory B; Ferrer-Admettla, Anna; Foll, Matthieu; Jensen, Jeffrey D

2014-12-01

In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies. Copyright © 2014 Elsevier Ltd. All rights reserved.
Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

PubMed Central

Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

2013-01-01

Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540
Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

PubMed

Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

2013-01-01

Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.
Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

PubMed Central

Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

2016-01-01

Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696
Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus).

PubMed

Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

2016-01-01

Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus.
Energetic differences between bacterioplankton trophic groups and coral reef resistance

PubMed Central

McDole Somera, Tracey; Bailey, Barbara; Barott, Katie; Grasis, Juris; Hatay, Mark; Hilton, Brett J.; Hisakawa, Nao; Nosrat, Bahador; Nulton, James; Silveira, Cynthia B.; Sullivan, Chris; Brainard, Russell E.; Rohwer, Forest

2016-01-01

Coral reefs are among the most productive and diverse marine ecosystems on the Earth. They are also particularly sensitive to changing energetic requirements by different trophic levels. Microbialization specifically refers to the increase in the energetic metabolic demands of microbes relative to macrobes and is significantly correlated with increasing human influence on coral reefs. In this study, metabolic theory of ecology is used to quantify the relative contributions of two broad bacterioplankton groups, autotrophs and heterotrophs, to energy flux on 27 Pacific coral reef ecosystems experiencing human impact to varying degrees. The effective activation energy required for photosynthesis is lower than the average energy of activation for the biochemical reactions of the Krebs cycle, and changes in the proportional abundance of these two groups can greatly affect rates of energy and materials cycling. We show that reef-water communities with a higher proportional abundance of microbial autotrophs expend more metabolic energy per gram of microbial biomass. Increased energy and materials flux through fast energy channels (i.e. water-column associated microbial autotrophs) may dampen the detrimental effects of increased heterotrophic loads (e.g. coral disease) on coral reef systems experiencing anthropogenic disturbance. PMID:27097927
Energetic differences between bacterioplankton trophic groups and coral reef resistance.

PubMed

McDole Somera, Tracey; Bailey, Barbara; Barott, Katie; Grasis, Juris; Hatay, Mark; Hilton, Brett J; Hisakawa, Nao; Nosrat, Bahador; Nulton, James; Silveira, Cynthia B; Sullivan, Chris; Brainard, Russell E; Rohwer, Forest

2016-04-27

Coral reefs are among the most productive and diverse marine ecosystems on the Earth. They are also particularly sensitive to changing energetic requirements by different trophic levels. Microbialization specifically refers to the increase in the energetic metabolic demands of microbes relative to macrobes and is significantly correlated with increasing human influence on coral reefs. In this study, metabolic theory of ecology is used to quantify the relative contributions of two broad bacterioplankton groups, autotrophs and heterotrophs, to energy flux on 27 Pacific coral reef ecosystems experiencing human impact to varying degrees. The effective activation energy required for photosynthesis is lower than the average energy of activation for the biochemical reactions of the Krebs cycle, and changes in the proportional abundance of these two groups can greatly affect rates of energy and materials cycling. We show that reef-water communities with a higher proportional abundance of microbial autotrophs expend more metabolic energy per gram of microbial biomass. Increased energy and materials flux through fast energy channels (i.e. water-column associated microbial autotrophs) may dampen the detrimental effects of increased heterotrophic loads (e.g. coral disease) on coral reef systems experiencing anthropogenic disturbance. © 2016 The Author(s).
The evolutionary history of Saccharomyces species inferred from completed mitochondrial genomes and revision in the ‘yeast mitochondrial genetic code’

PubMed Central

Szabóová, Dana; Bielik, Peter; Poláková, Silvia; Šoltys, Katarína; Jatzová, Katarína; Szemes, Tomáš

2017-01-01

Abstract The yeast Saccharomyces are widely used to test ecological and evolutionary hypotheses. A large number of nuclear genomic DNA sequences are available, but mitochondrial genomic data are insufficient. We completed mitochondrial DNA (mtDNA) sequencing from Illumina MiSeq reads for all Saccharomyces species. All are circularly mapped molecules decreasing in size with phylogenetic distance from Saccharomyces cerevisiae but with similar gene content including regulatory and selfish elements like origins of replication, introns, free-standing open reading frames or GC clusters. Their most profound feature is species-specific alteration in gene order. The genetic code slightly differs from well-established yeast mitochondrial code as GUG is used rarely as the translation start and CGA and CGC code for arginine. The multilocus phylogeny, inferred from mtDNA, does not correlate with the trees derived from nuclear genes. mtDNA data demonstrate that Saccharomyces cariocanus should be assigned as a separate species and Saccharomyces bayanus CBS 380T should not be considered as a distinct species due to mtDNA nearly identical to Saccharomyces uvarum mtDNA. Apparently, comparison of mtDNAs should not be neglected in genomic studies as it is an important tool to understand the origin and evolutionary history of some yeast species. PMID:28992063
Transient changes in bacterioplankton communities induced by the submarine volcanic eruption of El Hierro (Canary Islands).

PubMed

Ferrera, Isabel; Arístegui, Javier; González, José M; Montero, María F; Fraile-Nuez, Eugenio; Gasol, Josep M

2015-01-01

The submarine volcanic eruption occurring near El Hierro (Canary Islands) in October 2011 provided a unique opportunity to determine the effects of such events on the microbial populations of the surrounding waters. The birth of a new underwater volcano produced a large plume of vent material detectable from space that led to abrupt changes in the physical-chemical properties of the water column. We combined flow cytometry and 454-pyrosequencing of 16S rRNA gene amplicons (V1-V3 regions for Bacteria and V3-V5 for Archaea) to monitor the area around the volcano through the eruptive and post-eruptive phases (November 2011 to April 2012). Flow cytometric analyses revealed higher abundance and relative activity (expressed as a percentage of high-nucleic acid content cells) of heterotrophic prokaryotes during the eruptive process as compared to post-eruptive stages. Changes observed in populations detectable by flow cytometry were more evident at depths closer to the volcano (~70-200 m), coinciding also with oxygen depletion. Alpha-diversity analyses revealed that species richness (Chao1 index) decreased during the eruptive phase; however, no dramatic changes in community composition were observed. The most abundant taxa during the eruptive phase were similar to those in the post-eruptive stages and to those typically prevalent in oceanic bacterioplankton communities (i.e. the alphaproteobacterial SAR11 group, the Flavobacteriia class of the Bacteroidetes and certain groups of Gammaproteobacteria). Yet, although at low abundance, we also detected the presence of taxa not typically found in bacterioplankton communities such as the Epsilonproteobacteria and members of the candidate division ZB3, particularly during the eruptive stage. These groups are often associated with deep-sea hydrothermal vents or sulfur-rich springs. Both cytometric and sequence analyses showed that once the eruption ceased, evidences of the volcano-induced changes were no longer observed.

Transient Changes in Bacterioplankton Communities Induced by the Submarine Volcanic Eruption of El Hierro (Canary Islands)

PubMed Central

Ferrera, Isabel; Arístegui, Javier; González, José M.; Montero, María F.; Fraile-Nuez, Eugenio; Gasol, Josep M.

2015-01-01

The submarine volcanic eruption occurring near El Hierro (Canary Islands) in October 2011 provided a unique opportunity to determine the effects of such events on the microbial populations of the surrounding waters. The birth of a new underwater volcano produced a large plume of vent material detectable from space that led to abrupt changes in the physical-chemical properties of the water column. We combined flow cytometry and 454-pyrosequencing of 16S rRNA gene amplicons (V1–V3 regions for Bacteria and V3–V5 for Archaea) to monitor the area around the volcano through the eruptive and post-eruptive phases (November 2011 to April 2012). Flow cytometric analyses revealed higher abundance and relative activity (expressed as a percentage of high-nucleic acid content cells) of heterotrophic prokaryotes during the eruptive process as compared to post-eruptive stages. Changes observed in populations detectable by flow cytometry were more evident at depths closer to the volcano (~70–200 m), coinciding also with oxygen depletion. Alpha-diversity analyses revealed that species richness (Chao1 index) decreased during the eruptive phase; however, no dramatic changes in community composition were observed. The most abundant taxa during the eruptive phase were similar to those in the post-eruptive stages and to those typically prevalent in oceanic bacterioplankton communities (i.e. the alphaproteobacterial SAR11 group, the Flavobacteriia class of the Bacteroidetes and certain groups of Gammaproteobacteria). Yet, although at low abundance, we also detected the presence of taxa not typically found in bacterioplankton communities such as the Epsilonproteobacteria and members of the candidate division ZB3, particularly during the eruptive stage. These groups are often associated with deep-sea hydrothermal vents or sulfur-rich springs. Both cytometric and sequence analyses showed that once the eruption ceased, evidences of the volcano-induced changes were no longer observed
Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

PubMed

Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-Ban; Chua, Xin-Yi; Cong, Yingnan; Hogan, James M; Maetschke, Stefan R; Ragan, Mark A

2017-06-30

We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed. © The Author 2017. Published by Oxford University Press.
Impacts of combined overfishing and oil spills on the plankton trophodynamics of the West Florida shelf over the last half century of 1965-2011: A two-dimensional simulation analysis of biotic state transitions, from a zooplankton- to a bacterioplankton-modulated ecosystem.

NASA Astrophysics Data System (ADS)

Walsh, J. J.; Lenes, J. M.; Darrow, B.; Parks, A.; Weisberg, R. H.

2016-03-01

Over 50 years of multiple anthropogenic perturbations, Florida zooplankton stocks of the northeastern Gulf of Mexico declined ten-fold, with increments of mainly dominant toxic dinoflagellate harmful algal blooms (HABs), rather than diatoms, and a shift in loci of nutrient remineralization and oxygen depletion by bacterioplankton, from the sea floor to near surface waters. Yet, lytic bacterial biomass and associated ammonification only increased at most five-fold over the same time period, with consequently little indication of new, expanded "dead zones" of diatom-induced hypoxia. After bacterial lysis of intact cells of these increased HABs, the remaining residues of zooplankton biomass decrements evidently instead exited the water column as malign aerosolized HAB asthma triggers, correlated by co-traveling mercury aerosols, within wind-borne sea sprays. To unravel the causal mechanisms of these inferred decadal food web transitions, a 36-state variable plankton model of algal, bacterial, protozoan, and copepod component communities replicated daily time series of each plankton group's representatives on the West Florida shelf (WFS) during 1965-2011. At the lower phytoplankton trophic levels, 52% of the ungrazed HAB increments, between 1965-1967 and 2001-2002 before recent oil spills, remained in the water column to kill fishes and fuel bacterioplankton. But, another 48% of the WFS primary production then left the ocean's surface as a harbinger of increased public health hazards during continuing sea spray exports of salts, HAB toxins, and Hg poisons. Following the Deepwater Horizon petroleum releases in 2010, little additional change of element partition among the altered importance of WFS food web components of the trophic pyramid then pertained between 2001-2002 and 2010-2011, despite when anomalous upwelled nutrient supplies instead favored retrograde benign, oil-tolerant diatoms over the HABs during 2010. Indeed, by 2011 HABs were back, with biomass
Whole-genome alignment.

PubMed

Dewey, Colin N

2012-01-01

Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.
Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

PubMed Central

Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

2000-01-01

Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638
Indexcov: fast coverage quality control for whole-genome sequencing.

PubMed

Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R

2017-11-01

The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.
Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.

PubMed

Ren, Jie; Song, Kai; Deng, Minghua; Reinert, Gesine; Cannon, Charles H; Sun, Fengzhu

2016-04-01

Next-generation sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome sequence. For clustering genomes using such NGS data, word-count based alignment-free sequence comparison is a promising approach, but for this approach, the underlying expected word counts are essential.A plausible model for this underlying distribution of word counts is given through modeling the DNA sequence as a Markov chain (MC). For single long sequences, efficient statistics are available to estimate the order of MCs and the transition probability matrix for the sequences. As NGS data do not provide a single long sequence, inference methods on Markovian properties of sequences based on single long sequences cannot be directly used for NGS short read data. Here we derive a normal approximation for such word counts. We also show that the traditional Chi-square statistic has an approximate gamma distribution ,: using the Lander-Waterman model for physical mapping. We propose several methods to estimate the order of the MC based on NGS reads and evaluate those using simulations. We illustrate the applications of our results by clustering genomic sequences of several vertebrate and tree species based on NGS reads using alignment-free sequence dissimilarity measures. We find that the estimated order of the MC has a considerable effect on the clustering results ,: and that the clustering results that use a N: MC of the estimated order give a plausible clustering of the species. Our implementation of the statistics developed here is available as R package 'NGS.MC' at http://www-rcf.usc.edu/∼fsun/Programs/NGS-MC/NGS-MC.html fsun@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genomic Repeat Abundances Contain Phylogenetic Signal

PubMed Central

Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

2015-01-01

A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464
multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes.

PubMed

Xue, Alexander T; Hickerson, Michael J

2017-11-01

Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
MicroScope: a platform for microbial genome annotation and comparative genomics

PubMed Central

Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of
MicroScope: a platform for microbial genome annotation and comparative genomics.

PubMed

Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

2009-01-01

The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of
Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park

PubMed Central

2013-01-01

Background A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. Results The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Conclusions Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. Reviewers This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia PMID:23607440
Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park.

PubMed

Podar, Mircea; Makarova, Kira S; Graham, David E; Wolf, Yuri I; Koonin, Eugene V; Reysenbach, Anna-Louise

2013-04-22

A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia.
Optofluidic Single-Cell Genome Amplification of Sub-micron Bacteria in the Ocean Subsurface

PubMed Central

Landry, Zachary C.; Vergin, Kevin; Mannenbach, Christopher; Block, Stephen; Yang, Qiao; Blainey, Paul; Carlson, Craig; Giovannoni, Stephen

2018-01-01

Optofluidic single-cell genome amplification was used to obtain genome sequences from sub-micron cells collected from the euphotic and mesopelagic zones of the northwestern Sargasso Sea. Plankton cells were visually selected and manually sorted with an optical trap, yielding 20 partial genome sequences representing seven bacterial phyla. Two organisms, E01-9C-26 (Gammaproteobacteria), represented by four single cell genomes, and Opi.OSU.00C, an uncharacterized Verrucomicrobia, were the first of their types retrieved by single cell genome sequencing and were studied in detail. Metagenomic data showed that E01-9C-26 is found throughout the dark ocean, while Opi.OSU.00C was observed to bloom transiently in the nutrient-depleted euphotic zone of the late spring and early summer. The E01-9C-26 genomes had an estimated size of 4.76–5.05 Mbps, and contained “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes. Metabolic reconstruction indicated E01-9C-26 are likely versatile methylotrophs capable of scavenging C1 compounds, methylated compounds, reduced sulfur compounds, and a wide range of amines, including D-amino acids. The genome sequences identified E01-9C-26 as a source of “O” and “W”-type monooxygenase genes related to methane and ammonium monooxygenases that were previously reported from ocean metagenomes, but are of unknown function. In contrast, Opi.OSU.00C genomes encode genes for catabolizing carbohydrate compounds normally associated with eukaryotic phytoplankton. This exploration of optofluidics showed that it was effective for retrieving diverse single-cell bacterioplankton genomes and has potential advantages in microbiology applications that require working with small sample volumes or targeting cells by their morphology.
Genome-wide inference of transcription factor-DNA binding specificity in cell regeneration using a combination strategy.

PubMed

Wang, Xiaofeng; Zhang, Aiqun; Ren, Weizheng; Chen, Caiyu; Dong, Jiahong

2012-11-01

The cell growth, development, and regeneration of tissue and organ are associated with a large number of gene regulation events, which are mediated in part by transcription factors (TFs) binding to cis-regulatory elements involved in the genome. Predicting the binding affinity and inferring the binding specificity of TF-DNA interactions at the genomic level would be fundamentally helpful for our understanding of the molecular mechanism and biological implication underlying sequence-specific TF-DNA recognition. In this study, we report the development of a combination method to characterize the interaction behavior of a 11-mer oligonucleotide segment and its mutations with the Gcn4p protein, a homodimeric, basic leucine zipper TF, and to predict the binding affinity and specificity of potential Gcn4p binders in the genome-wide scale. In this procedure, a position-mutated energy matrix is created based on molecular modeling analysis of native and mutated Gcn4p-DNA complex structures to describe the position-independent interaction energy profile of Gcn4p with different nucleotide types at each position of the oligonucleotide, and the energy terms extracted from the matrix and their interactives are then correlated with experimentally measured affinities of 19268 distinct oligonucleotides using statistical modeling methodology. Subsequently, the best one of built regression models is successfully applied to screen those of potential high-affinity Gcn4p binders from the complete genome. The findings arising from this study are briefly listed below: (i) The 11 positions of oligonucleotides are highly interactive and non-additive in contribution to Gcn4p-DNA binding affinity; (ii) Indirect conformational effects upon nucleotide mutations as well as associated subtle changes in interfacial atomic contacts, but not the direct nonbonded interactions, are primarily responsible for the sequence-specific recognition; (iii) The intrinsic synergistic effects among the sequence
Modulated Modularity Clustering as an Exploratory Tool for Functional Genomic Inference

PubMed Central

Stone, Eric A.; Ayroles, Julien F.

2009-01-01

In recent years, the advent of high-throughput assays, coupled with their diminishing cost, has facilitated a systems approach to biology. As a consequence, massive amounts of data are currently being generated, requiring efficient methodology aimed at the reduction of scale. Whole-genome transcriptional profiling is a standard component of systems-level analyses, and to reduce scale and improve inference clustering genes is common. Since clustering is often the first step toward generating hypotheses, cluster quality is critical. Conversely, because the validation of cluster-driven hypotheses is indirect, it is critical that quality clusters not be obtained by subjective means. In this paper, we present a new objective-based clustering method and demonstrate that it yields high-quality results. Our method, modulated modularity clustering (MMC), seeks community structure in graphical data. MMC modulates the connection strengths of edges in a weighted graph to maximize an objective function (called modularity) that quantifies community structure. The result of this maximization is a clustering through which tightly-connected groups of vertices emerge. Our application is to systems genetics, and we quantitatively compare MMC both to the hierarchical clustering method most commonly employed and to three popular spectral clustering approaches. We further validate MMC through analyses of human and Drosophila melanogaster expression data, demonstrating that the clusters we obtain are biologically meaningful. We show MMC to be effective and suitable to applications of large scale. In light of these features, we advocate MMC as a standard tool for exploration and hypothesis generation. PMID:19424432
Genomics and the challenging translation into conservation practice

Treesearch

Aaron B. A. Shafer; Jochen B. W. Wolf; Paulo C. Alves; Linnea Bergstrom; Michael W. Bruford; Ioana Brannstrom; Guy Colling; Love Dalen; Luc De Meester; Robert Ekblom; Katie D. Fawcett; Simone Fior; Mehrdad Hajibabaei; Jason A. Hill; A. Rus Hoezel; Jacob Hoglund; Evelyn L. Jensen; Johannes Krause; Torsten N. Kristensen; Michael Krutzen; John K. McKay; Anita J. Norman; Rob Ogden; E. Martin Osterling; N. Joop Ouborg; John Piccolo; Danijela Popovic; Craig R. Primmer; Floyd A. Reed; Marie Roumet; Jordi Salmona; Tamara Schenekar; Michael K. Schwartz; Gernot Segelbacher; Helen Senn; Jens Thaulow; Mia Valtonen; Andrew Veale; Philippine Vergeer; Nagarjun Vijay; Carles Vila; Matthias Weissensteiner; Lovisa Wennerstrom; Christopher W. Wheat; Piotr Zielinski

2015-01-01

The global loss of biodiversity continues at an alarming rate. Genomic approaches have been suggested as a promising tool for conservation practice as scaling up to genome-wide data can improve traditional conservation genetic inferences and provide qualitatively novel insights. However, the generation of genomic data and subsequent analyses and interpretations remain...
Gramene 2013: comparative plant genomics resources.

PubMed

Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

2014-01-01

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.
The Pattern of Change in the Abundances of Specific Bacterioplankton Groups Is Consistent across Different Nutrient-Enriched Habitats in Crete

PubMed Central

Fodelianakis, Stilianos; Papageorgiou, Nafsika; Pitta, Paraskevi; Kasapidis, Panagiotis; Karakassis, Ioannis

2014-01-01

A common source of disturbance for coastal aquatic habitats is nutrient enrichment through anthropogenic activities. Although the water column bacterioplankton communities in these environments have been characterized in some cases, changes in α-diversity and/or the abundances of specific taxonomic groups across enriched habitats remain unclear. Here, we investigated the bacterial community changes at three different nutrient-enriched and adjacent undisturbed habitats along the north coast of Crete, Greece: a fish farm, a closed bay within a town with low water renewal rates, and a city port where the level of nutrient enrichment and the trophic status of the habitat were different. Even though changes in α-diversity were different at each site, we observed across the sites a common change pattern accounting for most of the community variation for five of the most abundant bacterial groups: a decrease in the abundance of the Pelagibacteraceae and SAR86 and an increase in the abundance of the Alteromonadaceae, Rhodobacteraceae, and Cryomorphaceae in the impacted sites. The abundances of the groups that increased and decreased in the impacted sites were significantly correlated (positively and negatively, respectively) with the total heterotrophic bacterial counts and the concentrations of dissolved organic carbon and/or dissolved nitrogen and chlorophyll α, indicating that the common change pattern was associated with nutrient enrichment. Our results provide an in situ indication concerning the association of specific bacterioplankton groups with nutrient enrichment. These groups could potentially be used as indicators for nutrient enrichment if the pattern is confirmed over a broader spatial and temporal scale by future studies. PMID:24747897
Exploring Microdiversity in Novel Kordia sp. (Bacteroidetes) with Proteorhodopsin from the Tropical Indian Ocean via Single Amplified Genomes

PubMed Central

Royo-Llonch, Marta; Ferrera, Isabel; Cornejo-Castillo, Francisco M.; Sánchez, Pablo; Salazar, Guillem; Stepanauskas, Ramunas; González, José M.; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Pedrós-Alió, Carlos; Acinas, Silvia G.

2017-01-01

Marine Bacteroidetes constitute a very abundant bacterioplankton group in the oceans that plays a key role in recycling particulate organic matter and includes several photoheterotrophic members containing proteorhodopsin. Relatively few marine Bacteroidetes species have been described and, moreover, they correspond to cultured isolates, which in most cases do not represent the actual abundant or ecologically relevant microorganisms in the natural environment. In this study, we explored the microdiversity of 98 Single Amplified Genomes (SAGs) retrieved from the surface waters of the underexplored North Indian Ocean, whose most closely related isolate is Kordia algicida OT-1. Using Multi Locus Sequencing Analysis (MLSA) we found no microdiversity in the tested conserved phylogenetic markers (16S rRNA and 23S rRNA genes), the fast-evolving Internal Transcribed Spacer and the functional markers proteorhodopsin and the beta-subunit of RNA polymerase. Furthermore, we carried out a Fragment Recruitment Analysis (FRA) with marine metagenomes to learn about the distribution and dynamics of this microorganism in different locations, depths and size fractions. This analysis indicated that this taxon belongs to the rare biosphere, showing its highest abundance after upwelling-induced phytoplankton blooms and sinking to the deep ocean with large organic matter particles. This uncultured Kordia lineage likely represents a novel Kordia species (Kordia sp. CFSAG39SUR) that contains the proteorhodopsin gene and has a widespread spatial and vertical distribution. The combination of SAGs and MLSA makes a valuable approach to infer putative ecological roles of uncultured abundant microorganisms. PMID:28790980

Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

PubMed Central

Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

2008-01-01

Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
Comparative inference of duplicated genes produced by polyploidization in soybean genome.

PubMed

Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

2013-01-01

Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.
Quality of Computationally Inferred Gene Ontology Annotations

PubMed Central

Škunca, Nives; Altenhoff, Adrian; Dessimoz, Christophe

2012-01-01

Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. PMID:22693439
Bayesian reconstruction of transmission within outbreaks using genomic variants.

PubMed

De Maio, Nicola; Worby, Colin J; Wilson, Daniel J; Stoesser, Nicole

2018-04-01

Pathogen genome sequencing can reveal details of transmission histories and is a powerful tool in the fight against infectious disease. In particular, within-host pathogen genomic variants identified through heterozygous nucleotide base calls are a potential source of information to identify linked cases and infer direction and time of transmission. However, using such data effectively to model disease transmission presents a number of challenges, including differentiating genuine variants from those observed due to sequencing error, as well as the specification of a realistic model for within-host pathogen population dynamics. Here we propose a new Bayesian approach to transmission inference, BadTrIP (BAyesian epiDemiological TRansmission Inference from Polymorphisms), that explicitly models evolution of pathogen populations in an outbreak, transmission (including transmission bottlenecks), and sequencing error. BadTrIP enables the inference of host-to-host transmission from pathogen sequencing data and epidemiological data. By assuming that genomic variants are unlinked, our method does not require the computationally intensive and unreliable reconstruction of individual haplotypes. Using simulations we show that BadTrIP is robust in most scenarios and can accurately infer transmission events by efficiently combining information from genetic and epidemiological sources; thanks to its realistic model of pathogen evolution and the inclusion of epidemiological data, BadTrIP is also more accurate than existing approaches. BadTrIP is distributed as an open source package (https://bitbucket.org/nicofmay/badtrip) for the phylogenetic software BEAST2. We apply our method to reconstruct transmission history at the early stages of the 2014 Ebola outbreak, showcasing the power of within-host genomic variants to reconstruct transmission events.
Multi-InDel Analysis for Ancestry Inference of Sub-Populations in China

PubMed Central

Sun, Kuan; Ye, Yi; Luo, Tao; Hou, Yiping

2016-01-01

Ancestry inference is of great interest in diverse areas of scientific researches, including the forensic biology, medical genetics and anthropology. Various methods have been published for distinguishing populations. However, few reports refer to sub-populations (like ethnic groups) within Asian populations for the limitation of markers. Several InDel loci located very tightly in physical positions were treated as one marker by us, which is multi-InDel. The multi-InDel shows potential as Ancestry Inference Marker (AIM). In this study, we performed a genome-wide scan for multi-InDels as AIM. After examining the FST distributions in the 1000 Genomes Database, 12 candidates were selected and validated for eastern Asian populations. A multiplexed assay was developed as a panel to genotype 12 multi-InDel markers simultaneously. Ancestry component analysis with STRUCTURE and principal component analysis (PCA) were employed to estimate its capability for ancestry inference. Furthermore, ancestry assignments of trial individuals were conducted. It proved to be very effective when 210 samples from Han and Tibetan individuals in China were tested. The panel consisting of multi-InDel markers exhibited considerable potency in ancestry inference, and was suggested to be applied in forensic practices and genetic population studies. PMID:28004788
Across language families: Genome diversity mirrors linguistic variation within Europe

PubMed Central

Longobardi, Giuseppe; Ghirotto, Silvia; Guardiano, Cristina; Tassi, Francesca; Benazzo, Andrea; Ceolin, Andrea

2015-01-01

ABSTRACT Objectives: The notion that patterns of linguistic and biological variation may cast light on each other and on population histories dates back to Darwin's times; yet, turning this intuition into a proper research program has met with serious methodological difficulties, especially affecting language comparisons. This article takes advantage of two new tools of comparative linguistics: a refined list of Indo‐European cognate words, and a novel method of language comparison estimating linguistic diversity from a universal inventory of grammatical polymorphisms, and hence enabling comparison even across different families. We corroborated the method and used it to compare patterns of linguistic and genomic variation in Europe. Materials and Methods: Two sets of linguistic distances, lexical and syntactic, were inferred from these data and compared with measures of geographic and genomic distance through a series of matrix correlation tests. Linguistic and genomic trees were also estimated and compared. A method (Treemix) was used to infer migration episodes after the main population splits. Results: We observed significant correlations between genomic and linguistic diversity, the latter inferred from data on both Indo‐European and non‐Indo‐European languages. Contrary to previous observations, on the European scale, language proved a better predictor of genomic differences than geography. Inferred episodes of genetic admixture following the main population splits found convincing correlates also in the linguistic realm. Discussion: These results pave the ground for previously unfeasible cross‐disciplinary analyses at the worldwide scale, encompassing populations of distant language families. Am J Phys Anthropol 157:630–640, 2015. © 2015 Wiley Periodicals, Inc. PMID:26059462
SAR202 Genomes from the Dark Ocean Predict Pathways for the Oxidation of Recalcitrant Dissolved Organic Matter

PubMed Central

Landry, Zachary; Swan, Brandon K.; Herndl, Gerhard J.; Stepanauskas, Ramunas

2017-01-01

ABSTRACT Deep-ocean regions beyond the reach of sunlight contain an estimated 615 Pg of dissolved organic matter (DOM), much of which persists for thousands of years. It is thought that bacteria oxidize DOM until it is too dilute or refractory to support microbial activity. We analyzed five single-amplified genomes (SAGs) from the abundant SAR202 clade of dark-ocean bacterioplankton and found they encode multiple families of paralogous enzymes involved in carbon catabolism, including several families of oxidative enzymes that we hypothesize participate in the degradation of cyclic alkanes. The five partial genomes encoded 152 flavin mononucleotide/F420-dependent monooxygenases (FMNOs), many of which are predicted to be type II Baeyer-Villiger monooxygenases (BVMOs) that catalyze oxygen insertion into semilabile alicyclic alkanes. The large number of oxidative enzymes, as well as other families of enzymes that appear to play complementary roles in catabolic pathways, suggests that SAR202 might catalyze final steps in the biological oxidation of relatively recalcitrant organic compounds to refractory compounds that persist. PMID:28420738
The dynamic evolutionary history of genome size in North American woodland salamanders.

PubMed

Newman, Catherine E; Gregory, T Ryan; Austin, Christopher C

2017-04-01

The genus Plethodon is the most species-rich salamander genus in North America, and nearly half of its species face an uncertain future. It is also one of the most diverse families in terms of genome sizes, which range from 1C = 18.2 to 69.3 pg, or 5-20 times larger than the human genome. Large genome size in salamanders results in part from accumulation of transposable elements and is associated with various developmental and physiological traits. However, genome sizes have been reported for only 25% of the species of Plethodon (14 of 55). We collected genome size data for Plethodon serratus to supplement an ongoing phylogeographic study, reconstructed the evolutionary history of genome size in Plethodontidae, and inferred probable genome sizes for the 41 species missing empirical data. Results revealed multiple genome size changes in Plethodon: genomes of western Plethodon increased, whereas genomes of eastern Plethodon decreased, followed by additional decreases or subsequent increases. The estimated genome size of P. serratus was 21 pg. New understanding of variation in genome size evolution, along with genome size inferences for previously unstudied taxa, provide a foundation for future studies on the biology of plethodontid salamanders.
fastBMA: scalable network inference and transitive reduction.

PubMed

Hung, Ling-Hong; Shi, Kaiyuan; Wu, Migao; Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee

2017-10-01

Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/). © The Authors 2017. Published by Oxford University Press.
Demographic History of the Genus Pan Inferred from Whole Mitochondrial Genome Reconstructions

PubMed Central

Tucci, Serena; de Manuel, Marc; Ghirotto, Silvia; Benazzo, Andrea; Prado-Martinez, Javier; Lorente-Galdos, Belen; Nam, Kiwoong; Dabad, Marc; Hernandez-Rodriguez, Jessica; Comas, David; Navarro, Arcadi; Schierup, Mikkel H.; Andres, Aida M.; Barbujani, Guido; Hvilsom, Christina; Marques-Bonet, Tomas

2016-01-01

The genus Pan is the closest genus to our own and it includes two species, Pan paniscus (bonobos) and Pan troglodytes (chimpanzees). The later is constituted by four subspecies, all highly endangered. The study of the Pan genera has been incessantly complicated by the intricate relationship among subspecies and the statistical limitations imposed by the reduced number of samples or genomic markers analyzed. Here, we present a new method to reconstruct complete mitochondrial genomes (mitogenomes) from whole genome shotgun (WGS) datasets, mtArchitect, showing that its reconstructions are highly accurate and consistent with long-range PCR mitogenomes. We used this approach to build the mitochondrial genomes of 20 newly sequenced samples which, together with available genomes, allowed us to analyze the hitherto most complete Pan mitochondrial genome dataset including 156 chimpanzee and 44 bonobo individuals, with a proportional contribution from all chimpanzee subspecies. We estimated the separation time between chimpanzees and bonobos around 1.15 million years ago (Mya) [0.81–1.49]. Further, we found that under the most probable genealogical model the two clades of chimpanzees, Western + Nigeria-Cameroon and Central + Eastern, separated at 0.59 Mya [0.41–0.78] with further internal separations at 0.32 Mya [0.22–0.43] and 0.16 Mya [0.17–0.34], respectively. Finally, for a subset of our samples, we compared nuclear versus mitochondrial genomes and we found that chimpanzee subspecies have different patterns of nuclear and mitochondrial diversity, which could be a result of either processes affecting the mitochondrial genome, such as hitchhiking or background selection, or a result of population dynamics. PMID:27345955
Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

PubMed

Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

2016-01-01

The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.
seXY: a tool for sex inference from genotype arrays.

PubMed

Qian, David C; Busam, Jonathan A; Xiao, Xiangjun; O'Mara, Tracy A; Eeles, Rosalind A; Schumacher, Frederick R; Phelan, Catherine M; Amos, Christopher I

2017-02-15

Checking concordance between reported sex and genotype-inferred sex is a crucial quality control measure in genome-wide association studies (GWAS). However, limited insights exist regarding the true accuracy of software that infer sex from genotype array data. We present seXY, a logistic regression model trained on both X chromosome heterozygosity and Y chromosome missingness, that consistently demonstrated >99.5% sex inference accuracy in cross-validation for 889 males and 5,361 females enrolled in prostate cancer and ovarian cancer GWAS. Compared to PLINK, one of the most popular tools for sex inference in GWAS that assesses only X chromosome heterozygosity, seXY achieved marginally better male classification and 3% more accurate female classification. https://github.com/Christopher-Amos-Lab/seXY. Christopher.I.Amos@dartmouth.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A prior-based integrative framework for functional transcriptional regulatory network inference

PubMed Central

Siahpirani, Alireza F.

2017-01-01

Abstract Transcriptional regulatory networks specify regulatory proteins controlling the context-specific expression levels of genes. Inference of genome-wide regulatory networks is central to understanding gene regulation, but remains an open challenge. Expression-based network inference is among the most popular methods to infer regulatory networks, however, networks inferred from such methods have low overlap with experimentally derived (e.g. ChIP-chip and transcription factor (TF) knockouts) networks. Currently we have a limited understanding of this discrepancy. To address this gap, we first develop a regulatory network inference algorithm, based on probabilistic graphical models, to integrate expression with auxiliary datasets supporting a regulatory edge. Second, we comprehensively analyze our and other state-of-the-art methods on different expression perturbation datasets. Networks inferred by integrating sequence-specific motifs with expression have substantially greater agreement with experimentally derived networks, while remaining more predictive of expression than motif-based networks. Our analysis suggests natural genetic variation as the most informative perturbation for network inference, and, identifies core TFs whose targets are predictable from expression. Multiple reasons make the identification of targets of other TFs difficult, including network architecture and insufficient variation of TF mRNA level. Finally, we demonstrate the utility of our inference algorithm to infer stress-specific regulatory networks and for regulator prioritization. PMID:27794550
The common ground of genomics and systems biology

PubMed Central

2014-01-01

The rise of systems biology is intertwined with that of genomics, yet their primordial relationship to one another is ill-defined. We discuss how the growth of genomics provided a critical boost to the popularity of systems biology. We describe the parts of genomics that share common areas of interest with systems biology today in the areas of gene expression, network inference, chromatin state analysis, pathway analysis, personalized medicine, and upcoming areas of synergy as genomics continues to expand its scope across all biomedical fields. PMID:25033072
Independent evolution of genomic characters during major metazoan transitions.

PubMed

Simakov, Oleg; Kawashima, Takeshi

2017-07-15

Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Landscape-scale spatial abundance distributions discriminate core from random components of boreal lake bacterioplankton.

PubMed

Niño-García, Juan Pablo; Ruiz-González, Clara; Del Giorgio, Paul A

2016-12-01

Aquatic bacterial communities harbour thousands of coexisting taxa. To meet the challenge of discriminating between a 'core' and a sporadically occurring 'random' component of these communities, we explored the spatial abundance distribution of individual bacterioplankton taxa across 198 boreal lakes and their associated fluvial networks (188 rivers). We found that all taxa could be grouped into four distinct categories based on model statistical distributions (normal like, bimodal, logistic and lognormal). The distribution patterns across lakes and their associated river networks showed that lake communities are composed of a core of taxa whose distribution appears to be linked to in-lake environmental sorting (normal-like and bimodal categories), and a large fraction of mostly rare bacteria (94% of all taxa) whose presence appears to be largely random and linked to downstream transport in aquatic networks (logistic and lognormal categories). These rare taxa are thus likely to reflect species sorting at upstream locations, providing a perspective of the conditions prevailing in entire aquatic networks rather than only in lakes. © 2016 John Wiley & Sons Ltd/CNRS.
Do neighboring lakes share common taxa of bacterioplankton? Comparison of 16S rDNA fingerprints and sequences from three geographic regions.

PubMed

Lindström, E S; Leskinen, E

2002-07-01

Bacterioplankton community composition was studied in 12 lakes in three different geographic regions in Scandinavia using denaturing gradient gel electrophoresis (DGGE) and sequencing of 16S rDNA. Area-specific abundant taxa were found in the lakes in two of the regions. In the region of Uppland the lakes had an alpha-proteobacterium, belonging to the subgroup Alpha V in common. The Alpha V bacteria appeared to be favored by neutral or higher pH values. The lakes in Lappland were found to harbor Actinobacteria, which appeared to be favored in bog lakes. No abundant taxon was found to be in common for the lakes in Svalbard, the third region studied.
Adaptive Change Inferred from Genomic Population Analysis of the ST93 Epidemic Clone of Community-Associated Methicillin-Resistant Staphylococcus aureus

PubMed Central

Stinear, Timothy P.; Holt, Kathryn E.; Chua, Kyra; Stepnell, Justin; Tuck, Kellie L.; Coombs, Geoffrey; Harrison, Paul Francis; Seemann, Torsten; Howden, Benjamin P.

2014-01-01

Community-associated methicillin-resistant Staphylococcus aureus (CA-MRSA) has emerged as a major public health problem around the world. In Australia, ST93-IV[2B] is the dominant CA-MRSA clone and displays significantly greater virulence than other S. aureus. Here, we have examined the evolution of ST93 via genomic analysis of 12 MSSA and 44 MRSA ST93 isolates, collected from around Australia over a 17-year period. Comparative analysis revealed a core genome of 2.6 Mb, sharing greater than 99.7% nucleotide identity. The accessory genome was 0.45 Mb and comprised additional mobile DNA elements, harboring resistance to erythromycin, trimethoprim, and tetracycline. Phylogenetic inference revealed a molecular clock and suggested that a single clone of methicillin susceptible, Panton-Valentine leukocidin (PVL) positive, ST93 S. aureus likely spread from North Western Australia in the early 1970s, acquiring methicillin resistance at least twice in the mid 1990s. We also explored associations between genotype and important MRSA phenotypes including oxacillin MIC and production of exotoxins (α-hemolysin [Hla], δ-hemolysin [Hld], PSMα3, and PVL). High-level expression of Hla is a signature feature of ST93 and reduced expression in eight isolates was readily explained by mutations in the agr locus. However, subtle but significant decreases in Hld were also noted over time that coincided with decreasing oxacillin resistance and were independent of agr mutations. The evolution of ST93 S. aureus is thus associated with a reduction in both exotoxin expression and oxacillin MIC, suggesting MRSA ST93 isolates are under pressure for adaptive change. PMID:24482534
Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling

PubMed Central

Shin, Junha; Lee, Insuk

2015-01-01

Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life—Archaea, Bacteria, and Eukaryota—suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co
Pan-Genomic Analysis Provides Insights into the Genomic Variation and Evolution of Salmonella Paratyphi A

PubMed Central

Chen, Chunxia; Cui, Xiaoying; Yu, Jun; Xiao, Jingfa; Kan, Biao

2012-01-01

Salmonella Paratyphi A (S. Paratyphi A) is a highly adapted, human-specific pathogen that causes paratyphoid fever. Cases of paratyphoid fever have recently been increasing, and the disease is becoming a major public health concern, especially in Eastern and Southern Asia. To investigate the genomic variation and evolution of S. Paratyphi A, a pan-genomic analysis was performed on five newly sequenced S. Paratyphi A strains and two other reference strains. A whole genome comparison revealed that the seven genomes are collinear and that their organization is highly conserved. The high rate of substitutions in part of the core genome indicates that there are frequent homologous recombination events. Based on the changes in the pan-genome size and cluster number (both in the core functional genes and core pseudogenes), it can be inferred that the sharply increasing number of pseudogene clusters may have strong correlation with the inactivation of functional genes, and indicates that the S. Paratyphi A genome is being degraded. PMID:23028950

Annotation-based inference of transporter function.

PubMed

Lee, Thomas J; Paulsen, Ian; Karp, Peter

2008-07-01

We present a method for inferring and constructing transport reactions for transporter proteins based primarily on the analysis of the names of individual proteins in the genome annotation of an organism. Transport reactions are declarative descriptions of transporter activities, and thus can be manipulated computationally, unlike free-text protein names. Once transporter activities are encoded as transport reactions, a number of computational analyses are possible including database queries by transporter activity; inclusion of transporters into an automatically generated metabolic-map diagram that can be painted with omics data to aid in their interpretation; detection of anomalies in the metabolic and transport networks, such as substrates that are transported into the cell but are not inputs to any metabolic reaction or pathway; and comparative analyses of the transport capabilities of different organisms. On randomly selected organisms, the method achieves precision and recall rates of 0.93 and 0.90, respectively in identifying transporter proteins by name within the complete genome. The method obtains 67.5% accuracy in predicting complete transport reactions; if allowance is made for predictions that are overly general yet not incorrect, reaction prediction accuracy is 82.5%. The method is implemented as part of PathoLogic, the inference component of the Pathway Tools software. Pathway Tools is freely available to researchers at non-commercial institutions, including source code; a fee applies to commercial institutions. Supplementary data are available at Bioinformatics online.
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

PubMed Central

Gilad, Yoav; Pritchard, Jonathan K.; Stephens, Matthew

2015-01-01

Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.

PubMed

Raj, Anil; Shim, Heejung; Gilad, Yoav; Pritchard, Jonathan K; Stephens, Matthew

2015-01-01

Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
Optimal rates for phylogenetic inference and experimental design in the era of genome-scale datasets.

PubMed

Dornburg, Alex; Su, Zhuo; Townsend, Jeffrey P

2018-06-25

With the rise of genome- scale datasets there has been a call for increased data scrutiny and careful selection of loci appropriate for attempting the resolution of a phylogenetic problem. Such loci are desired to maximize phylogenetic information content while minimizing the risk of homoplasy. Theory posits the existence of characters that evolve under such an optimum rate, and efforts to determine optimal rates of inference have been a cornerstone of phylogenetic experimental design for over two decades. However, both theoretical and empirical investigations of optimal rates have varied dramatically in their conclusions: spanning no relationship to a tight relationship between the rate of change and phylogenetic utility. Here we synthesize these apparently contradictory views, demonstrating both empirical and theoretical conditions under which each is correct. We find that optimal rates of characters-not genes-are generally robust to most experimental design decisions. Moreover, consideration of site rate heterogeneity within a given locus is critical to accurate predictions of utility. Factors such as taxon sampling or the targeted number of characters providing support for a topology are additionally critical to the predictions of phylogenetic utility based on the rate of character change. Further, optimality of rates and predictions of phylogenetic utility are not equivalent, demonstrating the need for further development of comprehensive theory of phylogenetic experimental design.
Comparative analyses of plastid genomes from fourteen Cornales species: inferences for phylogenetic relationships and genome evolution.

PubMed

Fu, Chao-Nan; Li, Hong-Tao; Milne, Richard; Zhang, Ting; Ma, Peng-Fei; Yang, Jing; Li, De-Zhu; Gao, Lian-Ming

2017-12-08

The Cornales is the basal lineage of the asterids, the largest angiosperm clade. Phylogenetic relationships within the order were previously not fully resolved. Fifteen plastid genomes representing 14 species, ten genera and seven families of Cornales were newly sequenced for comparative analyses of genome features, evolution, and phylogenomics based on different partitioning schemes and filtering strategies. All plastomes of the 14 Cornales species had the typical quadripartite structure with a genome size ranging from 156,567 bp to 158,715 bp, which included two inverted repeats (25,859-26,451 bp) separated by a large single-copy region (86,089-87,835 bp) and a small single-copy region (18,250-18,856 bp) region. These plastomes encoded the same set of 114 unique genes including 31 transfer RNA, 4 ribosomal RNA and 79 coding genes, with an identical gene order across all examined Cornales species. Two genes (rpl22 and ycf15) contained premature stop codons in seven and five species respectively. The phylogenetic relationships among all sampled species were fully resolved with maximum support. Different filtering strategies (none, light and strict) of sequence alignment did not have an effect on these relationships. The topology recovered from coding and noncoding data sets was the same as for the whole plastome, regardless of filtering strategy. Moreover, mutational hotspots and highly informative regions were identified. Phylogenetic relationships among families and intergeneric relationships within family of Cornales were well resolved. Different filtering strategies and partitioning schemes do not influence the relationships. Plastid genomes have great potential to resolve deep phylogenetic relationships of plants.
Limitations to estimating bacterial cross-species transmission using genetic and genomic markers: inferences from simulation modeling

PubMed Central

Benavides, Julio A; Cross, Paul C; Luikart, Gordon; Creel, Scott

2014-01-01

Cross-species transmission (CST) of bacterial pathogens has major implications for human health, livestock, and wildlife management because it determines whether control actions in one species may have subsequent effects on other potential host species. The study of bacterial transmission has benefitted from methods measuring two types of genetic variation: variable number of tandem repeats (VNTRs) and single nucleotide polymorphisms (SNPs). However, it is unclear whether these data can distinguish between different epidemiological scenarios. We used a simulation model with two host species and known transmission rates (within and between species) to evaluate the utility of these markers for inferring CST. We found that CST estimates are biased for a wide range of parameters when based on VNTRs and a most parsimonious reconstructed phylogeny. However, estimations of CST rates lower than 5% can be achieved with relatively low bias using as low as 250 SNPs. CST estimates are sensitive to several parameters, including the number of mutations accumulated since introduction, stochasticity, the genetic difference of strains introduced, and the sampling effort. Our results suggest that, even with whole-genome sequences, unbiased estimates of CST will be difficult when sampling is limited, mutation rates are low, or for pathogens that were recently introduced. PMID:25469159
The genome of Theobroma cacao.

PubMed

Argout, Xavier; Salse, Jerome; Aury, Jean-Marc; Guiltinan, Mark J; Droc, Gaetan; Gouzy, Jerome; Allegre, Mathilde; Chaparro, Cristian; Legavre, Thierry; Maximova, Siela N; Abrouk, Michael; Murat, Florent; Fouet, Olivier; Poulain, Julie; Ruiz, Manuel; Roguet, Yolande; Rodier-Goud, Maguy; Barbosa-Neto, Jose Fernandes; Sabot, Francois; Kudrna, Dave; Ammiraju, Jetty Siva S; Schuster, Stephan C; Carlson, John E; Sallet, Erika; Schiex, Thomas; Dievart, Anne; Kramer, Melissa; Gelley, Laura; Shi, Zi; Bérard, Aurélie; Viot, Christopher; Boccara, Michel; Risterucci, Ange Marie; Guignon, Valentin; Sabau, Xavier; Axtell, Michael J; Ma, Zhaorong; Zhang, Yufan; Brown, Spencer; Bourge, Mickael; Golser, Wolfgang; Song, Xiang; Clement, Didier; Rivallan, Ronan; Tahi, Mathias; Akaza, Joseph Moroh; Pitollat, Bertrand; Gramacho, Karina; D'Hont, Angélique; Brunel, Dominique; Infante, Diogenes; Kebe, Ismael; Costet, Pierre; Wing, Rod; McCombie, W Richard; Guiderdoni, Emmanuel; Quetier, Francis; Panaud, Olivier; Wincker, Patrick; Bocs, Stephanie; Lanaud, Claire

2011-02-01

We sequenced and assembled the draft genome of Theobroma cacao, an economically important tropical-fruit tree crop that is the source of chocolate. This assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of these genes anchored on the 10 T. cacao chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example, flavonoid-related genes. It also provides a major source of candidate genes for T. cacao improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten T. cacao chromosomes were shaped from an ancestor through eleven chromosome fusions.
A new fast method for inferring multiple consensus trees using k-medoids.

PubMed

Tahiri, Nadia; Willems, Matthieu; Makarenkov, Vladimir

2018-04-05

Gene trees carry important information about specific evolutionary patterns which characterize the evolution of the corresponding gene families. However, a reliable species consensus tree cannot be inferred from a multiple sequence alignment of a single gene family or from the concatenation of alignments corresponding to gene families having different evolutionary histories. These evolutionary histories can be quite different due to horizontal transfer events or to ancient gene duplications which cause the emergence of paralogs within a genome. Many methods have been proposed to infer a single consensus tree from a collection of gene trees. Still, the application of these tree merging methods can lead to the loss of specific evolutionary patterns which characterize some gene families or some groups of gene families. Thus, the problem of inferring multiple consensus trees from a given set of gene trees becomes relevant. We describe a new fast method for inferring multiple consensus trees from a given set of phylogenetic trees (i.e. additive trees or X-trees) defined on the same set of species (i.e. objects or taxa). The traditional consensus approach yields a single consensus tree. We use the popular k-medoids partitioning algorithm to divide a given set of trees into several clusters of trees. We propose novel versions of the well-known Silhouette and Caliński-Harabasz cluster validity indices that are adapted for tree clustering with k-medoids. The efficiency of the new method was assessed using both synthetic and real data, such as a well-known phylogenetic dataset consisting of 47 gene trees inferred for 14 archaeal organisms. The method described here allows inference of multiple consensus trees from a given set of gene trees. It can be used to identify groups of gene trees having similar intragroup and different intergroup evolutionary histories. The main advantage of our method is that it is much faster than the existing tree clustering approaches, while
Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

PubMed Central

Huang, Yuan; Wang, Jun; Yang, Yongping; Fan, Chuanzhu; Chen, Jiahui

2017-01-01

Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs) and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in Salicaceae provide
Inferring transposons activity chronology by TRANScendence - TEs database and de-novo mining tool.

PubMed

Startek, Michał Piotr; Nogły, Jakub; Gromadka, Agnieszka; Grzebelus, Dariusz; Gambin, Anna

2017-10-16

The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.
Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

DOE PAGES

Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.; ...

2016-11-10

The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less
Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.

The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less
From the Beauty of Genomic Landscapes to the Strength of Transcriptional Mechanisms.

PubMed

Natoli, Gioacchino

2016-03-24

Genomic analyses are commonly used to infer trends and broad rules underlying transcriptional control. The innovative approach by Tong et al. to interrogate genomic datasets allows extracting mechanistic information on the specific regulation of individual genes. Copyright © 2016 Elsevier Inc. All rights reserved.
Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes

PubMed Central

Nakatani, Yoichiro; McLysaght, Aoife

2017-01-01

Abstract Motivation: It has been argued that whole-genome duplication (WGD) exerted a profound influence on the course of evolution. For the purpose of fully understanding the impact of WGD, several formal algorithms have been developed for reconstructing pre-WGD gene order in yeast and plant. However, to the best of our knowledge, those algorithms have never been successfully applied to WGD events in teleost and vertebrate, impeded by extensive gene shuffling and gene losses. Results: Here, we present a probabilistic model of macrosynteny (i.e. conserved linkage or chromosome-scale distribution of orthologs), develop a variational Bayes algorithm for inferring the structure of pre-WGD genomes, and study estimation accuracy by simulation. Then, by applying the method to the teleost WGD, we demonstrate effectiveness of the algorithm in a situation where gene-order reconstruction algorithms perform relatively poorly due to a high rate of rearrangement and extensive gene losses. Our high-resolution reconstruction reveals previously overlooked small-scale rearrangements, necessitating a revision to previous views on genome structure evolution in teleost and vertebrate. Conclusions: We have reconstructed the structure of a pre-WGD genome by employing a variational Bayes approach that was originally developed for inferring topics from millions of text documents. Interestingly, comparison of the macrosynteny and topic model algorithms suggests that macrosynteny can be regarded as documents on ancestral genome structure. From this perspective, the present study would seem to provide a textbook example of the prevalent metaphor that genomes are documents of evolutionary history. Availability and implementation: The analysis data are available for download at http://www.gen.tcd.ie/molevol/supp_data/MacrosyntenyTGD.zip, and the software written in Java is available upon request. Contact: yoichiro.nakatani@tcd.ie or aoife.mclysaght@tcd.ie Supplementary information
Genomes as documents of evolutionary history: a probabilistic macrosynteny model for the reconstruction of ancestral genomes.

PubMed

Nakatani, Yoichiro; McLysaght, Aoife

2017-07-15

It has been argued that whole-genome duplication (WGD) exerted a profound influence on the course of evolution. For the purpose of fully understanding the impact of WGD, several formal algorithms have been developed for reconstructing pre-WGD gene order in yeast and plant. However, to the best of our knowledge, those algorithms have never been successfully applied to WGD events in teleost and vertebrate, impeded by extensive gene shuffling and gene losses. Here, we present a probabilistic model of macrosynteny (i.e. conserved linkage or chromosome-scale distribution of orthologs), develop a variational Bayes algorithm for inferring the structure of pre-WGD genomes, and study estimation accuracy by simulation. Then, by applying the method to the teleost WGD, we demonstrate effectiveness of the algorithm in a situation where gene-order reconstruction algorithms perform relatively poorly due to a high rate of rearrangement and extensive gene losses. Our high-resolution reconstruction reveals previously overlooked small-scale rearrangements, necessitating a revision to previous views on genome structure evolution in teleost and vertebrate. We have reconstructed the structure of a pre-WGD genome by employing a variational Bayes approach that was originally developed for inferring topics from millions of text documents. Interestingly, comparison of the macrosynteny and topic model algorithms suggests that macrosynteny can be regarded as documents on ancestral genome structure. From this perspective, the present study would seem to provide a textbook example of the prevalent metaphor that genomes are documents of evolutionary history. The analysis data are available for download at http://www.gen.tcd.ie/molevol/supp_data/MacrosyntenyTGD.zip , and the software written in Java is available upon request. yoichiro.nakatani@tcd.ie or aoife.mclysaght@tcd.ie. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

PubMed Central

2013-01-01

Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as
Molecular phylogeny of Systellognatha (Plecoptera: Arctoperlaria) inferred from mitochondrial genome sequences.

PubMed

Chen, Zhi-Teng; Zhao, Meng-Yuan; Xu, Cheng; Du, Yu-Zhou

2018-05-01

The infraorder Systellognatha is the most species-rich clade in the insect order Plecoptera and includes six families in two superfamilies: Pteronarcyoidea (Pteronarcyidae, Peltoperlidae, and Styloperlidae) and Perloidea (Perlidae, Perlodidae, and Chloroperlidae). To resolve the debatable phylogeny of Systellognatha, we carried out the first mitochondrial phylogenetic analysis covering all the six families, including three newly sequenced mitogenomes from two families (Perlodidae and Peltoperlidae) and 15 published mitogenomes. The three newly reported mitogenomes share conserved mitogenomic features with other sequenced stoneflies. For phylogenetic analyses, we assembled five datasets with two inference methods to assess their influence on topology and nodal support within Systellognatha. The results indicated that inclusion of the third codon positions of PCGs, exclusion of rRNA genes, the use of nucleotide datasets and Bayesian inference could improve the phylogenetic reconstruction of Systellognatha. The monophyly of Perloidea was supported in the mitochondrial phylogeny, but Pteronarcyoidea was recovered as paraphyletic and remained controversial. In this mitochondrial phylogenetic study, the relationships within Systellognatha were recovered as (((Perlidae + (Perlodidae + Chloroperlidae)) + (Pteronarcyidae + Styloperlidae)) + Peltoperlidae). Copyright © 2018 Elsevier B.V. All rights reserved.
Permanent draft genomes of the Rhodopirellula maiorica strain SM1.

PubMed

Richter, Michael; Richter-Heitmann, Tim; Klindworth, Anna; Wegner, Carl-Eric; Frank, Carsten S; Harder, Jens; Glöckner, Frank Oliver

2014-02-01

The genome of Rhodopirellula maiorica strain SM1 was sequenced as a permanent draft to complement the full genome sequence of the type strain Rhodopirellula baltica SH1(T). This isolate is part of a larger study to infer the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the fifth of a series of five publications reporting in total eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.
Inferring nucleosome positions with their histone mark annotation from ChIP data

PubMed Central

Mammana, Alessandro; Vingron, Martin; Chung, Ho-Ryun

2013-01-01

Motivation: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 bp of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins, and it requires appropriate computational methods. Results: We present NucHunter, an algorithm that uses the data from ChIP-seq experiments directed against many histone modifications to infer positioned nucleosomes. NucHunter annotates each of these nucleosomes with the intensities of the histone modifications. We demonstrate that these annotations can be used to infer nucleosomal states with distinct correlations to underlying genomic features and chromatin-related processes, such as transcriptional start sites, enhancers, elongation by RNA polymerase II and chromatin-mediated repression. Thus, NucHunter is a versatile tool that can be used to predict positioned nucleosomes from a panel of histone modification ChIP-seq experiments and infer distinct histone modification patterns associated to different chromatin states. Availability: The software is available at http://epigen.molgen.mpg.de/nuchunter/. Contact: chung@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23981350
A Genome-Wide Landscape of Retrocopies in Primate Genomes.

PubMed

Navarro, Fábio C P; Galante, Pedro A F

2015-07-29

Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Genomics-informed isolation and characterization of a symbiotic Nanoarchaeota system from a terrestrial geothermal environment.

PubMed

Wurch, Louie; Giannone, Richard J; Belisle, Bernard S; Swift, Carolyn; Utturkar, Sagar; Hettich, Robert L; Reysenbach, Anna-Louise; Podar, Mircea

2016-07-05

Biological features can be inferred, based on genomic data, for many microbial lineages that remain uncultured. However, cultivation is important for characterizing an organism's physiology and testing its genome-encoded potential. Here we use single-cell genomics to infer cultivation conditions for the isolation of an ectosymbiotic Nanoarchaeota ('Nanopusillus acidilobi') and its host (Acidilobus, a crenarchaeote) from a terrestrial geothermal environment. The cells of 'Nanopusillus' are among the smallest known cellular organisms (100-300 nm). They appear to have a complete genetic information processing machinery, but lack almost all primary biosynthetic functions as well as respiration and ATP synthesis. Genomic and proteomic comparison with its distant relative, the marine Nanoarchaeum equitans illustrate an ancient, common evolutionary history of adaptation of the Nanoarchaeota to ectosymbiosis, so far unique among the Archaea.
Translational Genomics Research Institute (TGen): Quantified Cancer Cell Line Encyclopedia (CCLE) RNA-seq Data | Office of Cancer Genomics

Cancer.gov

Many applications analyze quantified transcript-level abundances to make inferences. Having completed this computation across the large sample set, the CTD2 Center at the Translational Genomics Research Institute presents the quantified data in a straightforward, consolidated form for these types of analyses.
Phylogenetic Invariants for Metazoan Mitochondrial Genome Evolution.

PubMed

Sankoff; Blanchette

1998-01-01

The method of phylogenetic invariants was developed to apply to aligned sequence data generated, according to a stochastic substitution model, for N species related through an unknown phylogenetic tree. The invariants are functions of the probabilities of the observable N-tuples, which are identically zero, over all choices of branch length, for some trees. Evaluating the invariants associated with all possible trees, using observed N-tuple frequencies over all sequence positions, enables us to rapidly infer the generating tree. An aspect of evolution at the genomic level much studied recently is the rearrangements of gene order along the chromosome from one species to another. Instead of the substitutions responsible for sequence evolution, we examine the non-local processes responsible for genome rearrangements such as inversion of arbitrarily long segments of chromosomes. By treating the potential adjacency of each possible pair of genes as a position", an appropriate substitution" model can be recognized as governing the rearrangement process, and a probabilistically principled phylogenetic inference can be set up. We calculate the invariants for this process for N=5, and apply them to mitochondrial genome data from coelomate metazoans, showing how they resolve key aspects of branching order.
Occurrence and Expression of Gene Transfer Agent Genes in Marine Bacterioplankton▿

PubMed Central

Biers, Erin J.; Wang, Kui; Pennington, Catherine; Belas, Robert; Chen, Feng; Moran, Mary Ann

2008-01-01

Genes with homology to the transduction-like gene transfer agent (GTA) were observed in genome sequences of three cultured members of the marine Roseobacter clade. A broader search for homologs for this host-controlled virus-like gene transfer system identified likely GTA systems in cultured Alphaproteobacteria, and particularly in marine bacterioplankton representatives. Expression of GTA genes and extracellular release of GTA particles (∼50 to 70 nm) was demonstrated experimentally for the Roseobacter clade member Silicibacter pomeroyi DSS-3, and intraspecific gene transfer was documented. GTA homologs are surprisingly infrequent in marine metagenomic sequence data, however, and the role of this lateral gene transfer mechanism in ocean bacterioplankton communities remains unclear. PMID:18359833
Similar genomic proportions of copy number variation within gray wolves and modern dog breeds inferred from whole genome sequencing.

PubMed

Serres-Armero, Aitor; Povolotskaya, Inna S; Quilez, Javier; Ramirez, Oscar; Santpere, Gabriel; Kuderna, Lukas F K; Hernandez-Rodriguez, Jessica; Fernandez-Callejo, Marcos; Gomez-Sanchez, Daniel; Freedman, Adam H; Fan, Zhenxin; Novembre, John; Navarro, Arcadi; Boyko, Adam; Wayne, Robert; Vilà, Carles; Lorente-Galdos, Belen; Marques-Bonet, Tomas

2017-12-19

Whole genome re-sequencing data from dogs and wolves are now commonly used to study how natural and artificial selection have shaped the patterns of genetic diversity. Single nucleotide polymorphisms, microsatellites and variants in mitochondrial DNA have been interrogated for links to specific phenotypes or signals of domestication. However, copy number variation (CNV), despite its increasingly recognized importance as a contributor to phenotypic diversity, has not been extensively explored in canids. Here, we develop a new accurate probabilistic framework to create fine-scale genomic maps of segmental duplications (SDs), compare patterns of CNV across groups and investigate their role in the evolution of the domestic dog by using information from 34 canine genomes. Our analyses show that duplicated regions are enriched in genes and hence likely possess functional importance. We identify 86 loci with large CNV differences between dogs and wolves, enriched in genes responsible for sensory perception, immune response, metabolic processes, etc. In striking contrast to the observed loss of nucleotide diversity in domestic dogs following the population bottlenecks that occurred during domestication and breed creation, we find a similar proportion of CNV loci in dogs and wolves, suggesting that other dynamics are acting to particularly select for CNVs with potentially functional impacts. This work is the first comparison of genome wide CNV patterns in domestic and wild canids using whole-genome sequencing data and our findings contribute to study the impact of novel kinds of genetic changes on the evolution of the domestic dog.
Intercoalescence time distribution of incomplete gene genealogies in temporally varying populations, and applications in population genetic inference.

PubMed

Chen, Hua

2013-03-01

Tracing back to a specific time T in the past, the genealogy of a sample of haplotypes may not have reached their common ancestor and may leave m lineages extant. For such an incomplete genealogy truncated at a specific time T in the past, the distribution and expectation of the intercoalescence times conditional on T are derived in an exact form in this paper for populations of deterministically time-varying sizes, specifically, for populations growing exponentially. The derived intercoalescence time distribution can be integrated to the coalescent-based joint allele frequency spectrum (JAFS) theory, and is useful for population genetic inference from large-scale genomic data, without relying on computationally intensive approaches, such as importance sampling and Markov Chain Monte Carlo (MCMC) methods. The inference of several important parameters relying on this derived conditional distribution is demonstrated: quantifying population growth rate and onset time, and estimating the number of ancestral lineages at a specific ancient time. Simulation studies confirm validity of the derivation and statistical efficiency of the methods using the derived intercoalescence time distribution. Two examples of real data are given to show the inference of the population growth rate of a European sample from the NIEHS Environmental Genome Project, and the number of ancient lineages of 31 mitochondrial genomes from Tibetan populations. © 2013 Blackwell Publishing Ltd/University College London.
Translational Genomics Research Institute: Quantified Cancer Cell Line Encyclopedia (CCLE) RNA-seq Data | Office of Cancer Genomics

Cancer.gov

Many applications analyze quantified transcript-level abundances to make inferences. Having completed this computation across the large sample set, the CTD2 Center at the Translational Genomics Research Institute presents the quantified data in a straightforward, consolidated form for these types of analyses. Experimental Approaches
Genome at Juncture of Early Human Migration: A Systematic Analysis of Two Whole Genomes and Thirteen Exomes from Kuwaiti Population Subgroup of Inferred Saudi Arabian Tribe Ancestry

PubMed Central

Alsmadi, Osama; Hebbar, Prashantha; Antony, Dinu; Behbehani, Kazem; Thanaraj, Thangavel Alphonse

2014-01-01

Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are ‘novel’. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10−16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3′ UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian
Away from darkness: a review on the effects of solar radiation on heterotrophic bacterioplankton activity

PubMed Central

Ruiz-González, Clara; Simó, Rafel; Sommaruga, Ruben; Gasol, Josep M.

2013-01-01

Heterotrophic bacterioplankton are main consumers of dissolved organic matter (OM) in aquatic ecosystems, including the sunlit upper layers of the ocean and freshwater bodies. Their well-known sensitivity to ultraviolet radiation (UVR), together with some recently discovered mechanisms bacteria have evolved to benefit from photosynthetically available radiation (PAR), suggest that natural sunlight plays a relevant, yet difficult to predict role in modulating bacterial biogeochemical functions in aquatic ecosystems. Three decades of experimental work assessing the effects of sunlight on natural bacterial heterotrophic activity reveal responses ranging from high stimulation to total inhibition. In this review, we compile the existing studies on the topic and discuss the potential causes underlying these contrasting results, with special emphasis on the largely overlooked influences of the community composition and the previous light exposure conditions, as well as the different temporal and spatial scales at which exposure to solar radiation fluctuates. These intricate sunlight-bacteria interactions have implications for our understanding of carbon fluxes in aquatic systems, yet further research is necessary before we can accurately evaluate or predict the consequences of increasing surface UVR levels associated with global change. PMID:23734148
Inference of cancer-specific gene regulatory networks using soft computing rules.

PubMed

Wang, Xiaosheng; Gotoh, Osamu

2010-03-24

Perturbations of gene regulatory networks are essentially responsible for oncogenesis. Therefore, inferring the gene regulatory networks is a key step to overcoming cancer. In this work, we propose a method for inferring directed gene regulatory networks based on soft computing rules, which can identify important cause-effect regulatory relations of gene expression. First, we identify important genes associated with a specific cancer (colon cancer) using a supervised learning approach. Next, we reconstruct the gene regulatory networks by inferring the regulatory relations among the identified genes, and their regulated relations by other genes within the genome. We obtain two meaningful findings. One is that upregulated genes are regulated by more genes than downregulated ones, while downregulated genes regulate more genes than upregulated ones. The other one is that tumor suppressors suppress tumor activators and activate other tumor suppressors strongly, while tumor activators activate other tumor activators and suppress tumor suppressors weakly, indicating the robustness of biological systems. These findings provide valuable insights into the pathogenesis of cancer.
CMIP: a software package capable of reconstructing genome-wide regulatory networks using gene expression data.

PubMed

Zheng, Guangyong; Xu, Yaochen; Zhang, Xiujun; Liu, Zhi-Ping; Wang, Zhuo; Chen, Luonan; Zhu, Xin-Guang

2016-12-23

A gene regulatory network (GRN) represents interactions of genes inside a cell or tissue, in which vertexes and edges stand for genes and their regulatory interactions respectively. Reconstruction of gene regulatory networks, in particular, genome-scale networks, is essential for comparative exploration of different species and mechanistic investigation of biological processes. Currently, most of network inference methods are computationally intensive, which are usually effective for small-scale tasks (e.g., networks with a few hundred genes), but are difficult to construct GRNs at genome-scale. Here, we present a software package for gene regulatory network reconstruction at a genomic level, in which gene interaction is measured by the conditional mutual information measurement using a parallel computing framework (so the package is named CMIP). The package is a greatly improved implementation of our previous PCA-CMI algorithm. In CMIP, we provide not only an automatic threshold determination method but also an effective parallel computing framework for network inference. Performance tests on benchmark datasets show that the accuracy of CMIP is comparable to most current network inference methods. Moreover, running tests on synthetic datasets demonstrate that CMIP can handle large datasets especially genome-wide datasets within an acceptable time period. In addition, successful application on a real genomic dataset confirms its practical applicability of the package. This new software package provides a powerful tool for genomic network reconstruction to biological community. The software can be accessed at http://www.picb.ac.cn/CMIP/ .
Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data

PubMed Central

Listman, JB; Malison, RT; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

2010-01-01

In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have non-mutually intelligible languages. In order to test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results of recent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. PMID:20979205
Southeast Asian origins of five Hill Tribe populations and correlation of genetic to linguistic relationships inferred with genome-wide SNP data.

PubMed

Listman, J B; Malison, R T; Sanichwankul, K; Ittiwut, C; Mutirangura, A; Gelernter, J

2011-02-01

In Thailand, the term Hill Tribe is used to describe populations whose members traditionally practice slash and burn agriculture and reside in the mountains. These tribes are thought to have migrated throughout Asia for up to 5,000 years, including migrations through Southern China and/or Southeast Asia. There have been continuous migrations southward from China into Thailand for approximately the past thousand years and the present geographic range of any given tribe straddles multiple political borders. As none of these populations have autochthonous scripts, written histories have until recently, been externally produced. Northern Asian, Tibetan, and Siberian origins of Hill Tribes have been proposed. All purport endogamy and have nonmutually intelligible languages. To test hypotheses regarding the geographic origins of these populations, relatedness and migrations among them and neighboring populations, and whether their genetic relationships correspond with their linguistic relationships, we analyzed 2,445 genome-wide SNP markers in 118 individuals from five Thai Hill Tribe populations (Akha, Hmong, Karen, Lahu, and Lisu), 90 individuals from majority Thai populations, and 826 individuals from Asian and Oceanean HGDP and HapMap populations using a Bayesian clustering method. Considering these results within the context of results ofrecent large-scale studies of Asian geographic genetic variation allows us to infer a shared Southeast Asian origin of these five Hill Tribe populations as well ancestry components that distinguish among them seen in successive levels of clustering. In addition, the inferred level of shared ancestry among the Hill Tribes corresponds well to relationships among their languages. 2010 Wiley-Liss, Inc.
CGBayesNets: Conditional Gaussian Bayesian Network Learning and Inference with Mixed Discrete and Continuous Data

PubMed Central

Weiss, Scott T.

2014-01-01

Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com. PMID:24922310
CGBayesNets: conditional Gaussian Bayesian network learning and inference with mixed discrete and continuous data.

PubMed

McGeachie, Michael J; Chang, Hsun-Hsien; Weiss, Scott T

2014-06-01

Bayesian Networks (BN) have been a popular predictive modeling formalism in bioinformatics, but their application in modern genomics has been slowed by an inability to cleanly handle domains with mixed discrete and continuous variables. Existing free BN software packages either discretize continuous variables, which can lead to information loss, or do not include inference routines, which makes prediction with the BN impossible. We present CGBayesNets, a BN package focused around prediction of a clinical phenotype from mixed discrete and continuous variables, which fills these gaps. CGBayesNets implements Bayesian likelihood and inference algorithms for the conditional Gaussian Bayesian network (CGBNs) formalism, one appropriate for predicting an outcome of interest from, e.g., multimodal genomic data. We provide four different network learning algorithms, each making a different tradeoff between computational cost and network likelihood. CGBayesNets provides a full suite of functions for model exploration and verification, including cross validation, bootstrapping, and AUC manipulation. We highlight several results obtained previously with CGBayesNets, including predictive models of wood properties from tree genomics, leukemia subtype classification from mixed genomic data, and robust prediction of intensive care unit mortality outcomes from metabolomic profiles. We also provide detailed example analysis on public metabolomic and gene expression datasets. CGBayesNets is implemented in MATLAB and available as MATLAB source code, under an Open Source license and anonymous download at http://www.cgbayesnets.com.
Lake Bacterial Assemblage Composition Is Sensitive to Biological Disturbance Caused by an Invasive Filter Feeder

PubMed Central

Carrick, Hunter J.; Cavaletto, Joann; Chiang, Edna; Johengen, Thomas H.; Vanderploeg, Henry A.

2017-01-01

ABSTRACT One approach to improve forecasts of how global change will affect ecosystem processes is to better understand how anthropogenic disturbances alter bacterial assemblages that drive biogeochemical cycles. Species invasions are important contributors to global change, but their impacts on bacterial community ecology are rarely investigated. Here, we studied direct impacts of invasive dreissenid mussels (IDMs), one of many invasive filter feeders, on freshwater lake bacterioplankton. We demonstrated that direct effects of IDMs reduced bacterial abundance and altered assemblage composition by preferentially removing larger and particle-associated bacteria. While this increased the relative abundances of many free-living bacterial taxa, some were susceptible to filter feeding, in line with efficient removal of phytoplankton cells of <2 μm. This selective removal of particle-associated and larger bacteria by IDMs altered inferred bacterial functional group representation, defined by carbon and energy source utilization. Specifically, we inferred an increased relative abundance of chemoorganoheterotrophs predicted to be capable of rhodopsin-dependent energy generation. In contrast to the few previous studies that have focused on the longer-term combined direct and indirect effects of IDMs on bacterioplankton, our study showed that IDMs act directly as a biological disturbance to which freshwater bacterial assemblages are sensitive. The negative impacts on particle-associated bacteria, which have been shown to be more active than free-living bacteria, and the inferred shifts in functional group representation raise the possibility that IDMs may directly alter bacterially mediated ecosystem functions. IMPORTANCE Freshwater bacteria play fundamental roles in global elemental cycling and are an intrinsic part of local food webs. Human activities are altering freshwater environments, and much has been learned regarding the sensitivity of bacterial assemblages to a
Lake Bacterial Assemblage Composition Is Sensitive to Biological Disturbance Caused by an Invasive Filter Feeder.

PubMed

Denef, Vincent J; Carrick, Hunter J; Cavaletto, Joann; Chiang, Edna; Johengen, Thomas H; Vanderploeg, Henry A

2017-01-01

One approach to improve forecasts of how global change will affect ecosystem processes is to better understand how anthropogenic disturbances alter bacterial assemblages that drive biogeochemical cycles. Species invasions are important contributors to global change, but their impacts on bacterial community ecology are rarely investigated. Here, we studied direct impacts of invasive dreissenid mussels (IDMs), one of many invasive filter feeders, on freshwater lake bacterioplankton. We demonstrated that direct effects of IDMs reduced bacterial abundance and altered assemblage composition by preferentially removing larger and particle-associated bacteria. While this increased the relative abundances of many free-living bacterial taxa, some were susceptible to filter feeding, in line with efficient removal of phytoplankton cells of <2 μm. This selective removal of particle-associated and larger bacteria by IDMs altered inferred bacterial functional group representation, defined by carbon and energy source utilization. Specifically, we inferred an increased relative abundance of chemoorganoheterotrophs predicted to be capable of rhodopsin-dependent energy generation. In contrast to the few previous studies that have focused on the longer-term combined direct and indirect effects of IDMs on bacterioplankton, our study showed that IDMs act directly as a biological disturbance to which freshwater bacterial assemblages are sensitive. The negative impacts on particle-associated bacteria, which have been shown to be more active than free-living bacteria, and the inferred shifts in functional group representation raise the possibility that IDMs may directly alter bacterially mediated ecosystem functions. IMPORTANCE Freshwater bacteria play fundamental roles in global elemental cycling and are an intrinsic part of local food webs. Human activities are altering freshwater environments, and much has been learned regarding the sensitivity of bacterial assemblages to a variety of
Lake Bacterial Assemblage Composition Is Sensitive to Biological Disturbance Caused by an Invasive Filter Feeder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Denef, Vincent J.; Carrick, Hunter J.; Cavaletto, Joann

One approach to improve forecasts of how global change will affect ecosystem processes is to better understand how anthropogenic disturbances alter bacterial assemblages that drive biogeochemical cycles. Species invasions are important contributors to global change, but their impacts on bacterial community ecology are rarely investigated. Here, we studied direct impacts of invasive dreissenid mussels (IDMs), one of many invasive filter feeders, on freshwater lake bacterioplankton. We demonstrated that direct effects of IDMs reduced bacterial abundance and altered assemblage composition by preferentially removing larger and particle-associated bacteria. While this increased the relative abundances of many free-living bacterial taxa, some were susceptiblemore » to filter feeding, in line with efficient removal of phytoplankton cells of <2 μm. This selective removal of particle-associated and larger bacteria by IDMs altered inferred bacterial functional group representation, defined by carbon and energy source utilization. Specifically, we inferred an increased relative abundance of chemoorganoheterotrophs predicted to be capable of rhodopsin-dependent energy generation. In contrast to the few previous studies that have focused on the longer-term combined direct and indirect effects of IDMs on bacterioplankton, our study showed that IDMs act directly as a biological disturbance to which freshwater bacterial assemblages are sensitive. The negative impacts on particle-associated bacteria, which have been shown to be more active than free-living bacteria, and the inferred shifts in functional group representation raise the possibility that IDMs may directly alter bacterially mediated ecosystem functions.Freshwater bacteria play fundamental roles in global elemental cycling and are an intrinsic part of local food webs. Human activities are altering freshwater environments, and much has been learned regarding the sensitivity of bacterial assemblages to a variety of
Lake Bacterial Assemblage Composition Is Sensitive to Biological Disturbance Caused by an Invasive Filter Feeder

DOE PAGES

Denef, Vincent J.; Carrick, Hunter J.; Cavaletto, Joann; ...

2017-05-31

One approach to improve forecasts of how global change will affect ecosystem processes is to better understand how anthropogenic disturbances alter bacterial assemblages that drive biogeochemical cycles. Species invasions are important contributors to global change, but their impacts on bacterial community ecology are rarely investigated. Here, we studied direct impacts of invasive dreissenid mussels (IDMs), one of many invasive filter feeders, on freshwater lake bacterioplankton. We demonstrated that direct effects of IDMs reduced bacterial abundance and altered assemblage composition by preferentially removing larger and particle-associated bacteria. While this increased the relative abundances of many free-living bacterial taxa, some were susceptiblemore » to filter feeding, in line with efficient removal of phytoplankton cells of <2 μm. This selective removal of particle-associated and larger bacteria by IDMs altered inferred bacterial functional group representation, defined by carbon and energy source utilization. Specifically, we inferred an increased relative abundance of chemoorganoheterotrophs predicted to be capable of rhodopsin-dependent energy generation. In contrast to the few previous studies that have focused on the longer-term combined direct and indirect effects of IDMs on bacterioplankton, our study showed that IDMs act directly as a biological disturbance to which freshwater bacterial assemblages are sensitive. The negative impacts on particle-associated bacteria, which have been shown to be more active than free-living bacteria, and the inferred shifts in functional group representation raise the possibility that IDMs may directly alter bacterially mediated ecosystem functions.Freshwater bacteria play fundamental roles in global elemental cycling and are an intrinsic part of local food webs. Human activities are altering freshwater environments, and much has been learned regarding the sensitivity of bacterial assemblages to a variety of
Genomics-informed isolation and characterization of a symbiotic Nanoarchaeota system from a terrestrial geothermal environment

DOE PAGES

Wurch, Louie; Giannone, Richard J.; Belisle, Bernard S.; ...

2016-07-05

Biological features can be inferred, based on genomic data, for many microbial lineages that remain uncultured. However, cultivation is important for characterizing an organism’s physiology and testing its genome-encoded potential. Here we use single-cell genomics to infer cultivation conditions for the isolation of an ectosymbiotic Nanoarchaeota (‘Nanopusillus acidilobi’) and its host (Acidilobus, a crenarchaeote) from a terrestrial geothermal environment. The cells of ‘Nanopusillus’ are among the smallest known cellular organisms (100–300 nm). They appear to have a complete genetic information processing machinery, but lack almost all primary biosynthetic functions as well as respiration and ATP synthesis. Lastly, genomic and proteomicmore » comparison with its distant relative, the marine Nanoarchaeum equitans illustrate an ancient, common evolutionary history of adaptation of the Nanoarchaeota to ectosymbiosis, so far unique among the Archaea.« less

Genomics-informed isolation and characterization of a symbiotic Nanoarchaeota system from a terrestrial geothermal environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wurch, Louie; Giannone, Richard J.; Belisle, Bernard S.

Biological features can be inferred, based on genomic data, for many microbial lineages that remain uncultured. However, cultivation is important for characterizing an organism’s physiology and testing its genome-encoded potential. Here we use single-cell genomics to infer cultivation conditions for the isolation of an ectosymbiotic Nanoarchaeota (‘Nanopusillus acidilobi’) and its host (Acidilobus, a crenarchaeote) from a terrestrial geothermal environment. The cells of ‘Nanopusillus’ are among the smallest known cellular organisms (100–300 nm). They appear to have a complete genetic information processing machinery, but lack almost all primary biosynthetic functions as well as respiration and ATP synthesis. Lastly, genomic and proteomicmore » comparison with its distant relative, the marine Nanoarchaeum equitans illustrate an ancient, common evolutionary history of adaptation of the Nanoarchaeota to ectosymbiosis, so far unique among the Archaea.« less
Genomics-informed isolation and characterization of a symbiotic Nanoarchaeota system from a terrestrial geothermal environment

PubMed Central

Wurch, Louie; Giannone, Richard J.; Belisle, Bernard S.; Swift, Carolyn; Utturkar, Sagar; Hettich, Robert L.; Reysenbach, Anna-Louise; Podar, Mircea

2016-01-01

Biological features can be inferred, based on genomic data, for many microbial lineages that remain uncultured. However, cultivation is important for characterizing an organism's physiology and testing its genome-encoded potential. Here we use single-cell genomics to infer cultivation conditions for the isolation of an ectosymbiotic Nanoarchaeota (‘Nanopusillus acidilobi') and its host (Acidilobus, a crenarchaeote) from a terrestrial geothermal environment. The cells of ‘Nanopusillus' are among the smallest known cellular organisms (100–300 nm). They appear to have a complete genetic information processing machinery, but lack almost all primary biosynthetic functions as well as respiration and ATP synthesis. Genomic and proteomic comparison with its distant relative, the marine Nanoarchaeum equitans illustrate an ancient, common evolutionary history of adaptation of the Nanoarchaeota to ectosymbiosis, so far unique among the Archaea. PMID:27378076
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination

PubMed Central

Mirzaei, Sajad; Wu, Yufeng

2017-01-01

Abstract Motivation: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. Results: In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT, which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT. The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. Availability and Implementation: RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus. Contacts: sajad@engr.uconn.edu or ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28065901
Inference of directional selection and mutation parameters assuming equilibrium.

PubMed

Vogl, Claus; Bergman, Juraj

2015-12-01

In a classical study, Wright (1931) proposed a model for the evolution of a biallelic locus under the influence of mutation, directional selection and drift. He derived the equilibrium distribution of the allelic proportion conditional on the scaled mutation rate, the mutation bias and the scaled strength of directional selection. The equilibrium distribution can be used for inference of these parameters with genome-wide datasets of "site frequency spectra" (SFS). Assuming that the scaled mutation rate is low, Wright's model can be approximated by a boundary-mutation model, where mutations are introduced into the population exclusively from sites fixed for the preferred or unpreferred allelic states. With the boundary-mutation model, inference can be partitioned: (i) the shape of the SFS distribution within the polymorphic region is determined by random drift and directional selection, but not by the mutation parameters, such that inference of the selection parameter relies exclusively on the polymorphic sites in the SFS; (ii) the mutation parameters can be inferred from the amount of polymorphic and monomorphic preferred and unpreferred alleles, conditional on the selection parameter. Herein, we derive maximum likelihood estimators for the mutation and selection parameters in equilibrium and apply the method to simulated SFS data as well as empirical data from a Madagascar population of Drosophila simulans. Copyright © 2015 Elsevier Inc. All rights reserved.
Plant Comparative and Functional Genomics

DOE PAGES

Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng; ...

2015-01-01

Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less
Plant Comparative and Functional Genomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng

Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less
Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

PubMed Central

Schrider, Daniel R.; Kern, Andrew D.

2015-01-01

The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212
Characterization of St and Y genome in StStYY Elymus species (Triticeae: Poaceae) using Sequential FISH and GISH

USDA-ARS?s Scientific Manuscript database

Tetraploid species possessing StY genome could be donors to hexaploid species having StYH, StYP, or StYW genome constitution in the genus Elymus, and a few of StY species have been intensely studied for inferring the origin of the Y genome. In this study, genome characterization of St and Y genome w...
Inference of higher-order relationships in the cycads from a large chloroplast data set.

PubMed

Rai, Hardeep S; O'Brien, Heath E; Reeves, Patrick A; Olmstead, Richard G; Graham, Sean W

2003-11-01

We investigated higher-order relationships in the cycads, an ancient group of seed-bearing plants, by examining a large portion of the chloroplast genome from seven species chosen to exemplify our current understanding of taxonomic diversity in the order. The regions considered span approximately 13.5 kb of unaligned data per taxon, and comprise a diverse range of coding sequences, introns and intergenic spacers dispersed throughout the plastid genome. Our results provide substantial support for most of the inferred backbone of cycad phylogeny, and weak evidence that the sister-group of the cycads among living seed plants is Ginkgo biloba. Cycas (representing Cycadaceae) is the sister-group of the remaining cycads; Dioon is part of the next most basal split. Two of the three commonly recognized families of cycads (Zamiaceae and Stangeriaceae) are not monophyletic; Stangeria is embedded within Zamiaceae, close to Zamia and Ceratozamia, and not closely allied to the other genus of Stangeriaceae, Bowenia. In contrast to the other seed plants, cycad chloroplast genomes share two features with Ginkgo: a reduced rate of evolution and an elevated transition:transversion ratio. We demonstrate that the latter aspect of their molecular evolution is unlikely to have affected inference of cycad relationships in the context of seed-plant wide analyses.
Inference of Expanded Lrp-Like Feast/Famine Transcription Factor Targets in a Non-Model Organism Using Protein Structure-Based Prediction

PubMed Central

Ashworth, Justin; Plaisier, Christopher L.; Lo, Fang Yin; Reiss, David J.; Baliga, Nitin S.

2014-01-01

Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer. PMID:25255272
Inference of expanded Lrp-like feast/famine transcription factor targets in a non-model organism using protein structure-based prediction.

PubMed

Ashworth, Justin; Plaisier, Christopher L; Lo, Fang Yin; Reiss, David J; Baliga, Nitin S

2014-01-01

Widespread microbial genome sequencing presents an opportunity to understand the gene regulatory networks of non-model organisms. This requires knowledge of the binding sites for transcription factors whose DNA-binding properties are unknown or difficult to infer. We adapted a protein structure-based method to predict the specificities and putative regulons of homologous transcription factors across diverse species. As a proof-of-concept we predicted the specificities and transcriptional target genes of divergent archaeal feast/famine regulatory proteins, several of which are encoded in the genome of Halobacterium salinarum. This was validated by comparison to experimentally determined specificities for transcription factors in distantly related extremophiles, chromatin immunoprecipitation experiments, and cis-regulatory sequence conservation across eighteen related species of halobacteria. Through this analysis we were able to infer that Halobacterium salinarum employs a divergent local trans-regulatory strategy to regulate genes (carA and carB) involved in arginine and pyrimidine metabolism, whereas Escherichia coli employs an operon. The prediction of gene regulatory binding sites using structure-based methods is useful for the inference of gene regulatory relationships in new species that are otherwise difficult to infer.
Determining protein function and interaction from genome analysis

DOEpatents

Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

2004-08-03

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Diurnal variation in bacterioplankton composition and DNA damage in the microbial community from an Andean oligotrophic lake.

PubMed

Fernández-Zenoff, María V; Estévez, María C; Farías, María E

2014-01-01

Laguna Azul is an oligotrophic lake situated at 4,560 m above sea level and subject to a high level of solar radiation. Bacterioplankton community composition (BCC) was analysed by denaturing gradient gel electrophoresis and the impact of solar ultraviolet radiation was assessed by measuring cyclobutane pyrimidine dimers (CPD). Furthermore, pure cultures of Acinetobacter johnsonii A2 and Rhodococcus sp. A5 were exposed simultaneously and CPD accumulation was studied. Gel analyses generated a total of 7 sequences belonging to Alpha-proteobacteria (1 band), Beta-proteobacteria (1 band), Bacteroidetes (2 bands), Actinobacteria (1 band), and Firmicutes (1 band). DGGE profiles showed minimal changes in BCC and no CPD was detected even though a high level of damage was found in biodosimeters. A. johnsonii A2 showed low level of DNA damage while Rhodococcus sp. A5 exhibited high resistance since no CPD were detected under natural UV-B exposure, suggesting that the bacterial community is well adapted to this highly solar irradiated environment. Copyright © 2014 Asociación Argentina de Microbiología. Publicado por Elsevier España. All rights reserved.
Inferring Gene Family Histories in Yeast Identifies Lineage Specific Expansions

PubMed Central

Ames, Ryan M.; Money, Daniel; Lovell, Simon C.

2014-01-01

The complement of genes found in the genome is a balance between gene gain and gene loss. Knowledge of the specific genes that are gained and lost over evolutionary time allows an understanding of the evolution of biological functions. Here we use new evolutionary models to infer gene family histories across complete yeast genomes; these models allow us to estimate the relative genome-wide rates of gene birth, death, innovation and extinction (loss of an entire family) for the first time. We show that the rates of gene family evolution vary both between gene families and between species. We are also able to identify those families that have experienced rapid lineage specific expansion/contraction and show that these families are enriched for specific functions. Moreover, we find that families with specific functions are repeatedly expanded in multiple species, suggesting the presence of common adaptations and that these family expansions/contractions are not random. Additionally, we identify potential specialisations, unique to specific species, in the functions of lineage specific expanded families. These results suggest that an important mechanism in the evolution of genome content is the presence of lineage-specific gene family changes. PMID:24921666
The Tarenaya hassleriana Genome Provides Insight into Reproductive Trait and Genome Evolution of Crucifers[W][OPEN

PubMed Central

Cheng, Shifeng; van den Bergh, Erik; Zeng, Peng; Zhong, Xiao; Xu, Jiajia; Liu, Xin; Hofberger, Johannes; de Bruijn, Suzanne; Bhide, Amey S.; Kuelahoglu, Canan; Bian, Chao; Chen, Jing; Fan, Guangyi; Kaufmann, Kerstin; Hall, Jocelyn C.; Becker, Annette; Bräutigam, Andrea; Weber, Andreas P.M.; Shi, Chengcheng; Zheng, Zhijun; Li, Wujiao; Lv, Mingju; Tao, Yimin; Wang, Junyi; Zou, Hongfeng; Quan, Zhiwu; Hibberd, Julian M.; Zhang, Gengyun; Zhu, Xin-Guang; Xu, Xun; Schranz, M. Eric

2013-01-01

The Brassicaceae, including Arabidopsis thaliana and Brassica crops, is unmatched among plants in its wealth of genomic and functional molecular data and has long served as a model for understanding gene, genome, and trait evolution. However, genome information from a phylogenetic outgroup that is essential for inferring directionality of evolutionary change has been lacking. We therefore sequenced the genome of the spider flower (Tarenaya hassleriana) from the Brassicaceae sister family, the Cleomaceae. By comparative analysis of the two lineages, we show that genome evolution following ancient polyploidy and gene duplication events affect reproductively important traits. We found an ancient genome triplication in Tarenaya (Th-α) that is independent of the Brassicaceae-specific duplication (At-α) and nested Brassica (Br-α) triplication. To showcase the potential of sister lineage genome analysis, we investigated the state of floral developmental genes and show Brassica retains twice as many floral MADS (for MINICHROMOSOME MAINTENANCE1, AGAMOUS, DEFICIENS and SERUM RESPONSE FACTOR) genes as Tarenaya that likely contribute to morphological diversity in Brassica. We also performed synteny analysis of gene families that confer self-incompatibility in Brassicaceae and found that the critical SERINE RECEPTOR KINASE receptor gene is derived from a lineage-specific tandem duplication. The T. hassleriana genome will facilitate future research toward elucidating the evolutionary history of Brassicaceae genomes. PMID:23983221
RENT+: an improved method for inferring local genealogical trees from haplotypes with recombination.

PubMed

Mirzaei, Sajad; Wu, Yufeng

2017-04-01

: Haplotypes from one or multiple related populations share a common genealogical history. If this shared genealogy can be inferred from haplotypes, it can be very useful for many population genetics problems. However, with the presence of recombination, the genealogical history of haplotypes is complex and cannot be represented by a single genealogical tree. Therefore, inference of genealogical history with recombination is much more challenging than the case of no recombination. : In this paper, we present a new approach called RENT+ for the inference of local genealogical trees from haplotypes with the presence of recombination. RENT+ builds on a previous genealogy inference approach called RENT , which infers a set of related genealogical trees at different genomic positions. RENT+ represents a significant improvement over RENT in the sense that it is more effective in extracting information contained in the haplotype data about the underlying genealogy than RENT . The key components of RENT+ are several greatly enhanced genealogy inference rules. Through simulation, we show that RENT+ is more efficient and accurate than several existing genealogy inference methods. As an application, we apply RENT+ in the inference of population demographic history from haplotypes, which outperforms several existing methods. : RENT+ is implemented in Java, and is freely available for download from: https://github.com/SajadMirzaei/RentPlus . : sajad@engr.uconn.edu or ywu@engr.uconn.edu. : Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Sparse representation and Bayesian detection of genome copy number alterations from microarray data.

PubMed

Pique-Regi, Roger; Monso-Varona, Jordi; Ortega, Antonio; Seeger, Robert C; Triche, Timothy J; Asgharzadeh, Shahab

2008-02-01

Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) that are associated with the development and behavior of tumors. Advances in microarray technology have allowed for greater resolution in detection of DNA copy number changes (amplifications or deletions) across the genome. However, the increase in number of measured signals and accompanying noise from the array probes present a challenge in accurate and fast identification of breakpoints that define CNA. This article proposes a novel detection technique that exploits the use of piece wise constant (PWC) vectors to represent genome copy number and sparse Bayesian learning (SBL) to detect CNA breakpoints. First, a compact linear algebra representation for the genome copy number is developed from normalized probe intensities. Second, SBL is applied and optimized to infer locations where copy number changes occur. Third, a backward elimination (BE) procedure is used to rank the inferred breakpoints; and a cut-off point can be efficiently adjusted in this procedure to control for the false discovery rate (FDR). The performance of our algorithm is evaluated using simulated and real genome datasets and compared to other existing techniques. Our approach achieves the highest accuracy and lowest FDR while improving computational speed by several orders of magnitude. The proposed algorithm has been developed into a free standing software application (GADA, Genome Alteration Detection Algorithm). http://biron.usc.edu/~piquereg/GADA
Positional orthology: putting genomic evolutionary relationships into context.

PubMed

Dewey, Colin N

2011-09-01

Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of 'positional orthology' has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term 'toporthology', with respect to the evolutionary events experienced by a gene's ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology.
Positional orthology: putting genomic evolutionary relationships into context

PubMed Central

2011-01-01

Orthology is a powerful refinement of homology that allows us to describe more precisely the evolution of genomes and understand the function of the genes they contain. However, because orthology is not concerned with genomic position, it is limited in its ability to describe genes that are likely to have equivalent roles in different genomes. Because of this limitation, the concept of ‘positional orthology’ has emerged, which describes the relation between orthologous genes that retain their ancestral genomic positions. In this review, we formally define this concept, for which we introduce the shorter term ‘toporthology’, with respect to the evolutionary events experienced by a gene’s ancestors. Through a discussion of recent studies on the role of genomic context in gene evolution, we show that the distinction between orthology and toporthology is biologically significant. We then review a number of orthology prediction methods that take genomic context into account and thus that may be used to infer the important relation of toporthology. PMID:21705766
Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value.

PubMed

Shin, Donghyun; Lee, Chul; Park, Kyoung-Do; Kim, Heebal; Cho, Kwang-Hyeon

2017-03-01

Holsteins are known as the world's highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein) using Korean Holstein data. This study was performed using single nucleotide polymorphism (SNP) chip data (Illumina BovineSNP50 Beadchip) of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP) and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.

Adaptation, ecology, and evolution of the halophilic stromatolite archaeon Halococcus hamelinensis inferred through genome analyses.

PubMed

Gudhka, Reema K; Neilan, Brett A; Burns, Brendan P

2015-01-01

Halococcus hamelinensis was the first archaeon isolated from stromatolites. These geomicrobial ecosystems are thought to be some of the earliest known on Earth, yet, despite their evolutionary significance, the role of Archaea in these systems is still not well understood. Detailed here is the genome sequencing and analysis of an archaeon isolated from stromatolites. The genome of H. hamelinensis consisted of 3,133,046 base pairs with an average G+C content of 60.08% and contained 3,150 predicted coding sequences or ORFs, 2,196 (68.67%) of which were protein-coding genes with functional assignments and 954 (29.83%) of which were of unknown function. Codon usage of the H. hamelinensis genome was consistent with a highly acidic proteome, a major adaptive mechanism towards high salinity. Amino acid transport and metabolism, inorganic ion transport and metabolism, energy production and conversion, ribosomal structure, and unknown function COG genes were overrepresented. The genome of H. hamelinensis also revealed characteristics reflecting its survival in its extreme environment, including putative genes/pathways involved in osmoprotection, oxidative stress response, and UV damage repair. Finally, genome analyses indicated the presence of putative transposases as well as positive matches of genes of H. hamelinensis against various genomes of Bacteria, Archaea, and viruses, suggesting the potential for horizontal gene transfer.
Divergence times in Caenorhabditis and Drosophila inferred from direct estimates of the neutral mutation rate.

PubMed

Cutter, Asher D

2008-04-01

Accurate inference of the dates of common ancestry among species forms a central problem in understanding the evolutionary history of organisms. Molecular estimates of divergence time rely on the molecular evolutionary prediction that neutral mutations and substitutions occur at the same constant rate in genomes of related species. This underlies the notion of a molecular clock. Most implementations of this idea depend on paleontological calibration to infer dates of common ancestry, but taxa with poor fossil records must rely on external, potentially inappropriate, calibration with distantly related species. The classic biological models Caenorhabditis and Drosophila are examples of such problem taxa. Here, I illustrate internal calibration in these groups with direct estimates of the mutation rate from contemporary populations that are corrected for interfering effects of selection on the assumption of neutrality of substitutions. Divergence times are inferred among 6 species each of Caenorhabditis and Drosophila, based on thousands of orthologous groups of genes. I propose that the 2 closest known species of Caenorhabditis shared a common ancestor <24 MYA (Caenorhabditis briggsae and Caenorhabditis sp. 5) and that Caenorhabditis elegans diverged from its closest known relatives <30 MYA, assuming that these species pass through at least 6 generations per year; these estimates are much more recent than reported previously with molecular clock calibrations from non-nematode phyla. Dates inferred for the common ancestor of Drosophila melanogaster and Drosophila simulans are roughly concordant with previous studies. These revised dates have important implications for rates of genome evolution and the origin of self-fertilization in Caenorhabditis.
Carnivore-specific SINEs (Can-SINEs): distribution, evolution, and genomic impact.

PubMed

Walters-Conte, Kathryn B; Johnson, Diana L E; Allard, Marc W; Pecon-Slattery, Jill

2011-01-01

Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.
Carnivore-Specific SINEs (Can-SINEs): Distribution, Evolution, and Genomic Impact

PubMed Central

Johnson, Diana L.E.; Allard, Marc W.; Pecon-Slattery, Jill

2011-01-01

Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics. PMID:21846743
Integration of multi-omics data for integrative gene regulatory network inference.

PubMed

Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun; Kang, Mingon

2017-01-01

Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called 'multi-omics data', that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN's capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed.
Integration of multi-omics data for integrative gene regulatory network inference

PubMed Central

Zarayeneh, Neda; Ko, Euiseong; Oh, Jung Hun; Suh, Sang; Liu, Chunyu; Gao, Jean; Kim, Donghyun

2017-01-01

Gene regulatory networks provide comprehensive insights and indepth understanding of complex biological processes. The molecular interactions of gene regulatory networks are inferred from a single type of genomic data, e.g., gene expression data in most research. However, gene expression is a product of sequential interactions of multiple biological processes, such as DNA sequence variations, copy number variations, histone modifications, transcription factors, and DNA methylations. The recent rapid advances of high-throughput omics technologies enable one to measure multiple types of omics data, called ‘multi-omics data’, that represent the various biological processes. In this paper, we propose an Integrative Gene Regulatory Network inference method (iGRN) that incorporates multi-omics data and their interactions in gene regulatory networks. In addition to gene expressions, copy number variations and DNA methylations were considered for multi-omics data in this paper. The intensive experiments were carried out with simulation data, where iGRN’s capability that infers the integrative gene regulatory network is assessed. Through the experiments, iGRN shows its better performance on model representation and interpretation than other integrative methods in gene regulatory network inference. iGRN was also applied to a human brain dataset of psychiatric disorders, and the biological network of psychiatric disorders was analysed. PMID:29354189
CowPI: A Rumen Microbiome Focussed Version of the PICRUSt Functional Inference Software.

PubMed

Wilkinson, Toby J; Huws, Sharon A; Edwards, Joan E; Kingston-Smith, Alison H; Siu-Ting, Karen; Hughes, Martin; Rubino, Francesco; Friedersdorff, Maximillian; Creevey, Christopher J

2018-01-01

Metataxonomic 16S rDNA based studies are a commonplace and useful tool in the research of the microbiome, but they do not provide the full investigative power of metagenomics and metatranscriptomics for revealing the functional potential of microbial communities. However, the use of metagenomic and metatranscriptomic technologies is hindered by high costs and skills barrier necessary to generate and interpret the data. To address this, a tool for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was developed for inferring the functional potential of an observed microbiome profile, based on 16S data. This allows functional inferences to be made from metataxonomic 16S rDNA studies with little extra work or cost, but its accuracy relies on the availability of completely sequenced genomes of representative organisms from the community being investigated. The rumen microbiome is an example of a community traditionally underrepresented in genome and sequence databases, but recent efforts by projects such as the Global Rumen Census and Hungate 1000 have resulted in a wide sampling of 16S rDNA profiles and almost 500 fully sequenced microbial genomes from this environment. Using this information, we have developed "CowPI," a focused version of the PICRUSt tool provided for use by the wider scientific community in the study of the rumen microbiome. We evaluated the accuracy of CowPI and PICRUSt using two 16S datasets from the rumen microbiome: one generated from rDNA and the other from rRNA where corresponding metagenomic and metatranscriptomic data was also available. We show that the functional profiles predicted by CowPI better match estimates for both the meta-genomic and transcriptomic datasets than PICRUSt, and capture the higher degree of genetic variation and larger pangenomes of rumen organisms. Nonetheless, whilst being closer in terms of predictive power for the rumen microbiome, there were differences when compared to
Drivers of interannual variability in virioplankton abundance at the coastal western Antarctic peninsula and the potential effects of climate change.

PubMed

Evans, Claire; Brandsma, Joost; Pond, David W; Venables, Hugh J; Meredith, Michael P; Witte, Harry J; Stammerjohn, Sharon; Wilson, William H; Clarke, Andrew; Brussaard, Corina P D

2017-02-01

An 8-year time-series in the Western Antarctic Peninsula (WAP) with an approximately weekly sampling frequency was used to elucidate changes in virioplankton abundance and their drivers in this climatically sensitive region. Virioplankton abundances at the coastal WAP show a pronounced seasonal cycle with interannual variability in the timing and magnitude of the summer maxima. Bacterioplankton abundance is the most influential driving factor of the virioplankton, and exhibit closely coupled dynamics. Sea ice cover and duration predetermine levels of phytoplankton stock and thus, influence virioplankton by dictating the substrates available to the bacterioplankton. However, variations in the composition of the phytoplankton community and particularly the prominence of Diatoms inferred from silicate drawdown, drive interannual differences in the magnitude of the virioplankton bloom; likely again mediated through changes in the bacterioplankton. Their findings suggest that future warming within the WAP will cause changes in sea ice that will influence viruses and their microbial hosts through changes in the timing, magnitude and composition of the phytoplankton bloom. Thus, the flow of matter and energy through the viral shunt may be decreased with consequences for the Antarctic food web and element cycling. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
Bioinformatic Workflows for Generating Complete Plastid Genome Sequences-An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade.

PubMed

Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas

2018-06-21

The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Diversity of bacterioplankton in the surface seawaters of Drake Passage near the Chinese Antarctic station.

PubMed

Xing, Mengxin; Li, Zhao; Wang, Wei; Sun, Mi

2015-07-01

The determination of relative abundances and distribution of different bacterial groups is a critical step toward understanding the functions of various bacteria and its surrounding environment. Few studies focus on the taxonomic composition and functional diversity of microbial communities in Drake Passage. In this study, marine bacterioplankton communities from surface seawaters at five locations in Drake Passage were examined by 16S rRNA gene sequence analyses. The results indicated that psychrophilic bacteria were the most abundant group in Drake Passage, and mainly made up of Bacillus, Aeromonas, Psychrobacter, Pseudomonas and Halomonas. Diversity analysis showed that surface seawater communities had no significant correlation with latitudinal gradient. Additionally, a clear difference among five surface seawater communities was evident, with 1.8% OTUs (only two) belonged to Bacillus consistent across five locations and 71% OTUs (80) existed in only one location. However, the few cosmopolitans had the largest population sizes. Our results support the hypothesis that the dominant bacterial groups appear to be analogous between geographical sites, but significant differences may be detected among rare bacterial groups. The microbial diversity of surface seawaters would be liable to be affected by environmental factors. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
An inference method from multi-layered structure of biomedical data.

PubMed

Kim, Myungjun; Nam, Yonghyun; Shin, Hyunjung

2017-05-18

Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.
Complex multi-enhancer contacts captured by genome architecture mapping.

PubMed

Beagrie, Robert A; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C A; Chotalia, Mita; Xie, Sheila Q; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A W; Nicodemi, Mario; Pombo, Ana

2017-03-23

The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. Here we report a genome-wide method, genome architecture mapping (GAM), for measuring chromatin contacts and other features of three-dimensional chromatin topology on the basis of sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify enrichment for specific interactions between active genes and enhancers across very large genomic distances using a mathematical model termed SLICE (statistical inference of co-segregation). GAM also reveals an abundance of three-way contacts across the genome, especially between regions that are highly transcribed or contain super-enhancers, providing a level of insight into genome architecture that, owing to the technical limitations of current technologies, has previously remained unattainable. Furthermore, GAM highlights a role for gene-expression-specific contacts in organizing the genome in mammalian nuclei.
Limitations of a metabolic network-based reverse ecology method for inferring host-pathogen interactions.

PubMed

Takemoto, Kazuhiro; Aie, Kazuki

2017-05-25

Host-pathogen interactions are important in a wide range of research fields. Given the importance of metabolic crosstalk between hosts and pathogens, a metabolic network-based reverse ecology method was proposed to infer these interactions. However, the validity of this method remains unclear because of the various explanations presented and the influence of potentially confounding factors that have thus far been neglected. We re-evaluated the importance of the reverse ecology method for evaluating host-pathogen interactions while statistically controlling for confounding effects using oxygen requirement, genome, metabolic network, and phylogeny data. Our data analyses showed that host-pathogen interactions were more strongly influenced by genome size, primary network parameters (e.g., number of edges), oxygen requirement, and phylogeny than the reserve ecology-based measures. These results indicate the limitations of the reverse ecology method; however, they do not discount the importance of adopting reverse ecology approaches altogether. Rather, we highlight the need for developing more suitable methods for inferring host-pathogen interactions and conducting more careful examinations of the relationships between metabolic networks and host-pathogen interactions.
Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

PubMed

Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

2014-04-01

Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.
Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

PubMed Central

Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

2014-01-01

Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153
Horizontal gene acquisitions contributed to genome expansion in insect-symbiotic Spiroplasma clarkii.

PubMed

Tsai, Yi-Ming; Chang, An; Kuo, Chih-Horng

2018-06-01

Genome reduction is a recurring theme of symbiont evolution. The genus Spiroplasma contains species that are mostly facultative insect symbionts. The typical genome sizes of those species within the Apis clade were estimated to be ∼1.0-1.4 Mb. Intriguingly, Spiroplasma clarkii was found to have a genome size that is > 30% larger than the median of other species within the same clade. To investigate the molecular evolution events that led to the genome expansion of this bacterium, we determined its complete genome sequence and inferred the evolutionary origin of each protein-coding gene based on the phylogenetic distribution of homologs. Among the 1,346 annotated protein-coding genes, 641 were originated from within the Apis clade while 233 were putatively acquired from outside of the clade (including 91 high-confidence candidates). Additionally, 472 were specific to S. clarkii without homologs in the current database (i.e., the origins remained unknown). The acquisition of protein-coding genes, rather than mobile genetic elements, appeared to be a major contributing factor of genome expansion. Notably, >50% of the high-confidence acquired genes are related to carbohydrate transport and metabolism, suggesting that these acquired genes contributed to the expansion of both genome size and metabolic capability. The findings of this work provided an interesting case against the general evolutionary trend observed among symbiotic bacteria and further demonstrated the flexibility of Spiroplasma genomes. For future studies, investigation on the functional integration of these acquired genes, as well as the inference of their contribution to fitness could improve our knowledge of symbiont evolution.
Sequencing and comparing whole mitochondrial genomes ofanimals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

2005-04-22

Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less
Hybridization Reveals the Evolving Genomic Architecture of Speciation

PubMed Central

Kronforst, Marcus R.; Hansen, Matthew E.B.; Crawford, Nicholas G.; Gallant, Jason R.; Zhang, Wei; Kulathinal, Rob J.; Kapan, Durrell D.; Mullen, Sean P.

2014-01-01

SUMMARY The rate at which genomes diverge during speciation is unknown, as are the physical dynamics of the process. Here, we compare full genome sequences of 32 butterflies, representing five species from a hybridizing Heliconius butterfly community, to examine genome-wide patterns of introgression and infer how divergence evolves during the speciation process. Our analyses reveal that initial divergence is restricted to a small fraction of the genome, largely clustered around known wing-patterning genes. Over time, divergence evolves rapidly, due primarily to the origin of new divergent regions. Furthermore, divergent genomic regions display signatures of both selection and adaptive introgression, demonstrating the link between microevolutionary processes acting within species and the origin of species across macroevolutionary timescales. Our results provide a uniquely comprehensive portrait of the evolving species boundary due to the role that hybridization plays in reducing the background accumulation of divergence at neutral sites. PMID:24183670
An argument for mechanism-based statistical inference in cancer

PubMed Central

Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent

2015-01-01

Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)

PubMed Central

Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana

2017-01-01

Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065

The History of Slavs Inferred from Complete Mitochondrial Genome Sequences

PubMed Central

Mielnik-Sikorska, Marta; Daca, Patrycja; Malyarchuk, Boris; Derenko, Miroslava; Skonieczna, Katarzyna; Perkova, Maria; Dobosz, Tadeusz; Grzybowski, Tomasz

2013-01-01

To shed more light on the processes leading to crystallization of a Slavic identity, we investigated variability of complete mitochondrial genomes belonging to haplogroups H5 and H6 (63 mtDNA genomes) from the populations of Eastern and Western Slavs, including new samples of Poles, Ukrainians and Czechs presented here. Molecular dating implies formation of H5 approximately 11.5–16 thousand years ago (kya) in the areas of southern Europe. Within ancient haplogroup H6, dated at around 15–28 kya, there is a subhaplogroup H6c, which probably survived the last glaciation in Europe and has undergone expansion only 3–4 kya, together with the ancestors of some European groups, including the Slavs, because H6c has been detected in Czechs, Poles and Slovaks. Detailed analysis of complete mtDNAs allowed us to identify a number of lineages that seem specific for Central and Eastern Europe (H5a1f, H5a2, H5a1r, H5a1s, H5b4, H5e1a, H5u1, some subbranches of H5a1a and H6a1a9). Some of them could possibly be traced back to at least ∼4 kya, which indicates that some of the ancestors of today's Slavs (Poles, Czechs, Slovaks, Ukrainians and Russians) inhabited areas of Central and Eastern Europe much earlier than it was estimated on the basis of archaeological and historical data. We also sequenced entire mitochondrial genomes of several non-European lineages (A, C, D, G, L) found in contemporary populations of Poland and Ukraine. The analysis of these haplogroups confirms the presence of Siberian (C5c1, A8a1) and Ashkenazi-specific (L2a1l2a) mtDNA lineages in Slavic populations. Moreover, we were able to pinpoint some lineages which could possibly reflect the relatively recent contacts of Slavs with nomadic Altaic peoples (C4a1a, G2a, D5a2a1a1). PMID:23342138
RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

PubMed

Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

2013-11-01

Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in
Seasonal influence of scallop culture on nutrient flux, bacterial pathogens and bacterioplankton diversity across estuaries off the Bohai Sea Coast of Northern China.

PubMed

He, Yaodong; Sen, Biswarup; Shang, Junyang; He, Yike; Xie, Ningdong; Zhang, Yongfeng; Zhang, Jianle; Johnson, Zackary I; Wang, Guangyi

2017-11-15

In this study, we investigated the environmental impacts of scallop culture on two coastal estuaries adjacent the Bohai Sea including developing a quantitative PCR assay to assess the abundance of the bacterial pathogens Escherichia coli and Vibrio parahaemolyticus. Scallop culture resulted in a significant reduction of nitrogen, Chlorophyll a, and phosphorous levels in seawater during summer. The abundance of bacteria including V. parahaemolyticus varied significantly across estuaries and breeding seasons and was influenced by nitrate as well as nutrient ratios (Si/DIN, N/P). Bacterioplankton diversity varied across the two estuaries and seasons, and was dominated by Proteobacteria, Cyanobacteria, Actinobacteria, Bacteroidetes. Overall, this study suggests a significant influence of scallop culture on the ecology of adjacent estuaries and offers a sensitive tool for monitoring scallop contamination. Copyright © 2017 Elsevier Ltd. All rights reserved.
Permanent draft genomes of the two Rhodopirellula europaea strains 6C and SH398.

PubMed

Richter-Heitmann, Tim; Richter, Michael; Klindworth, Anna; Wegner, Carl-Eric; Frank, Carsten S; Glöckner, Frank Oliver; Harder, Jens

2014-02-01

The genomes of two Rhodopirellula europaea strains were sequenced as permanent drafts to study the genomic diversity within this genus, especially in comparison with the closed genome of the type strain Rhodopirellula baltica SH1(T). The isolates are part of a larger study to infer the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the second of a series of five publications describing a total of eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.
The mitochondrial genome of Protostrongylus rufescens – implications for population and systematic studies

PubMed Central

2013-01-01

Background Protostrongylus rufescens is a metastrongyloid nematode of small ruminants, such as sheep and goats, causing protostrongylosis. In spite of its importance, the ecology and epidemiology of this parasite are not entirely understood. In addition, genetic data are scant for P. rufescens and related metastrongyloids. Methods The mt genome was amplified from a single adult worm of P. rufescens (from sheep) by long-PCR, sequenced using 454-technology and annotated using bioinformatic tools. Amino acid sequences inferred from individual genes of the mt genomes were concatenated and subjected to phylogenetic analysis using Bayesian inference. Results The circular mitochondrial genome was 13,619 bp in length and contained two ribosomal RNA, 12 protein-coding and 22 transfer RNA genes, consistent with nematodes of the order Strongylida for which mt genomes have been determined. Phylogenetic analysis of the concatenated amino acid sequence data for the 12 mt proteins showed that P. rufescens was closely related to Aelurostrongylus abstrusus, Angiostrongylus vasorum, Angiostrongylus cantonensis and Angiostrongylus costaricensis. Conclusions The mt genome determined herein provides a source of markers for future investigations of P. rufescens. Molecular tools, employing such mt markers, are likely to find applicability in studies of the population biology of this parasite and the systematics of lungworms. PMID:24025317
Single-Genome Sequencing of Hepatitis C Virus in Donor-Recipient Pairs Distinguishes Modes and Models of Virus Transmission and Early Diversification.

PubMed

Li, Hui; Stoddard, Mark B; Wang, Shuyi; Giorgi, Elena E; Blair, Lily M; Learn, Gerald H; Hahn, Beatrice H; Alter, Harvey J; Busch, Michael P; Fierer, Daniel S; Ribeiro, Ruy M; Perelson, Alan S; Bhattacharya, Tanmoy; Shaw, George M

2016-01-01

Despite the recent development of highly effective anti-hepatitis C virus (HCV) drugs, the global burden of this pathogen remains immense. Control or eradication of HCV will likely require the broad application of antiviral drugs and development of an effective vaccine. A precise molecular identification of transmitted/founder (T/F) HCV genomes that lead to productive clinical infection could play a critical role in vaccine research, as it has for HIV-1. However, the replication schema of these two RNA viruses differ substantially, as do viral responses to innate and adaptive host defenses. These differences raise questions as to the certainty of T/F HCV genome inferences, particularly in cases where multiple closely related sequence lineages have been observed. To clarify these issues and distinguish between competing models of early HCV diversification, we examined seven cases of acute HCV infection in humans and chimpanzees, including three examples of virus transmission between linked donors and recipients. Using single-genome sequencing (SGS) of plasma vRNA, we found that inferred T/F sequences in recipients were identical to viral sequences in their respective donors. Early in infection, HCV genomes generally evolved according to a simple model of random evolution where the coalescent corresponded to the T/F sequence. Closely related sequence lineages could be explained by high multiplicity infection from a donor whose viral sequences had undergone a pretransmission bottleneck due to treatment, immune selection, or recent infection. These findings validate SGS, together with mathematical modeling and phylogenetic analysis, as a novel strategy to infer T/F HCV genome sequences. Despite the recent development of highly effective, interferon-sparing anti-hepatitis C virus (HCV) drugs, the global burden of this pathogen remains immense. Control or eradication of HCV will likely require the broad application of antiviral drugs and the development of an effective
Conflicting Evolutionary Histories of the Mitochondrial and Nuclear Genomes in New World Myotis Bats.

PubMed

Platt, Roy N; Faircloth, Brant C; Sullivan, Kevin A M; Kieran, Troy J; Glenn, Travis C; Vandewege, Michael W; Lee, Thomas E; Baker, Robert J; Stevens, Richard D; Ray, David A

2018-03-01

The rapid diversification of Myotis bats into more than 100 species is one of the most extensive mammalian radiations available for study. Efforts to understand relationships within Myotis have primarily utilized mitochondrial markers and trees inferred from nuclear markers lacked resolution. Our current understanding of relationships within Myotis is therefore biased towards a set of phylogenetic markers that may not reflect the history of the nuclear genome. To resolve this, we sequenced the full mitochondrial genomes of 37 representative Myotis, primarily from the New World, in conjunction with targeted sequencing of 3648 ultraconserved elements (UCEs). We inferred the phylogeny and explored the effects of concatenation and summary phylogenetic methods, as well as combinations of markers based on informativeness or levels of missing data, on our results. Of the 294 phylogenies generated from the nuclear UCE data, all are significantly different from phylogenies inferred using mitochondrial genomes. Even within the nuclear data, quartet frequencies indicate that around half of all UCE loci conflict with the estimated species tree. Several factors can drive such conflict, including incomplete lineage sorting, introgressive hybridization, or even phylogenetic error. Despite the degree of discordance between nuclear UCE loci and the mitochondrial genome and among UCE loci themselves, the most common nuclear topology is recovered in one quarter of all analyses with strong nodal support. Based on these results, we re-examine the evolutionary history of Myotis to better understand the phenomena driving their unique nuclear, mitochondrial, and biogeographic histories.
The Discovery of Single-Nucleotide Polymorphisms—and Inferences about Human Demographic History

PubMed Central

Wakeley, John; Nielsen, Rasmus; Liu-Cordero, Shau Neen; Ardlie, Kristin

2001-01-01

A method of historical inference that accounts for ascertainment bias is developed and applied to single-nucleotide polymorphism (SNP) data in humans. The data consist of 84 short fragments of the genome that were selected, from three recent SNP surveys, to contain at least two polymorphisms in their respective ascertainment samples and that were then fully resequenced in 47 globally distributed individuals. Ascertainment bias is the deviation, from what would be observed in a random sample, caused either by discovery of polymorphisms in small samples or by locus selection based on levels or patterns of polymorphism. The three SNP surveys from which the present data were derived differ both in their protocols for ascertainment and in the size of the samples used for discovery. We implemented a Monte Carlo maximum-likelihood method to fit a subdivided-population model that includes a possible change in effective size at some time in the past. Incorrectly assuming that ascertainment bias does not exist causes errors in inference, affecting both estimates of migration rates and historical changes in size. Migration rates are overestimated when ascertainment bias is ignored. However, the direction of error in inferences about changes in effective population size (whether the population is inferred to be shrinking or growing) depends on whether either the numbers of SNPs per fragment or the SNP-allele frequencies are analyzed. We use the abbreviation “SDL,” for “SNP-discovered locus,” in recognition of the genomic-discovery context of SNPs. When ascertainment bias is modeled fully, both the number of SNPs per SDL and their allele frequencies support a scenario of growth in effective size in the context of a subdivided population. If subdivision is ignored, however, the hypothesis of constant effective population size cannot be rejected. An important conclusion of this work is that, in demographic or other studies, SNP data are useful only to the extent that
Inferring the mode of origin of polyploid species from next-generation sequence data.

PubMed

Roux, Camille; Pannell, John R

2015-03-01

Many eukaryote organisms are polyploid. However, despite their importance, evolutionary inference of polyploid origins and modes of inheritance has been limited by a need for analyses of allele segregation at multiple loci using crosses. The increasing availability of sequence data for nonmodel species now allows the application of established approaches for the analysis of genomic data in polyploids. Here, we ask whether approximate Bayesian computation (ABC), applied to realistic traditional and next-generation sequence data, allows correct inference of the evolutionary and demographic history of polyploids. Using simulations, we evaluate the robustness of evolutionary inference by ABC for tetraploid species as a function of the number of individuals and loci sampled, and the presence or absence of an outgroup. We find that ABC adequately retrieves the recent evolutionary history of polyploid species on the basis of both old and new sequencing technologies. The application of ABC to sequence data from diploid and polyploid species of the plant genus Capsella confirms its utility. Our analysis strongly supports an allopolyploid origin of C. bursa-pastoris about 80 000 years ago. This conclusion runs contrary to previous findings based on the same data set but using an alternative approach and is in agreement with recent findings based on whole-genome sequencing. Our results indicate that ABC is a promising and powerful method for revealing the evolution of polyploid species, without the need to attribute alleles to a homeologous chromosome pair. The approach can readily be extended to more complex scenarios involving higher ploidy levels. © 2015 John Wiley & Sons Ltd.
Similar Ratios of Introns to Intergenic Sequence across Animal Genomes

PubMed Central

Wörheide, Gert

2017-01-01

Abstract One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. PMID:28633296
Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

PubMed

Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D

2016-01-01

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.
Children's and adults' evaluation of the certainty of deductive inferences, inductive inferences, and guesses.

PubMed

Pillow, Bradford H

2002-01-01

Two experiments investigated kindergarten through fourth-grade children's and adults' (N = 128) ability to (1) evaluate the certainty of deductive inferences, inductive inferences, and guesses; and (2) explain the origins of inferential knowledge. When judging their own cognitive state, children in first grade and older rated deductive inferences as more certain than guesses; but when judging another person's knowledge, children did not distinguish valid inferences from invalid inferences and guesses until fourth grade. By third grade, children differentiated their own deductive inferences from inductive inferences and guesses, but only adults both differentiated deductive inferences from inductive inferences and differentiated inductive inferences from guesses. Children's recognition of their own inferences may contribute to the development of knowledge about cognitive processes, scientific reasoning, and a constructivist epistemology.
Improving Microbial Genome Annotations in an Integrated Database Context

PubMed Central

Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

2013-01-01

Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620
Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks

PubMed Central

Ringbauer, Harald; Coop, Graham

2017-01-01

Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of σ≈50−−100 km/gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance. PMID:28108588
Assigning protein functions by comparative genome analysis protein phylogenetic profiles

DOEpatents

Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

2003-05-13

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
The First Mitochondrial Genome for Caddisfly (Insecta: Trichoptera) with Phylogenetic Implications

PubMed Central

Wang, Yuyu; Liu, Xingyue; Yang, Ding

2014-01-01

The Trichoptera (caddisflies) is a holometabolous insect order with 14,300 described species forming the second most species-rich monophyletic group of animals in freshwater. Hitherto, there is no mitochondrial genome reported of this order. Herein, we describe the complete mitochondrial (mt) genome of a caddisfly species, Eubasilissa regina (McLachlan, 1871). A phylogenomic analysis was carried out based on the mt genomic sequences of 13 mt protein coding genes (PCGs) and two rRNA genes of 24 species belonging to eight holometabolous orders. Both maximum likelihood and Bayesian inference analyses highly support the sister relationship between Trichoptera and Lepidoptera. PMID:24391451
Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value

PubMed Central

Shin, Donghyun; Lee, Chul; Park, Kyoung-Do; Kim, Heebal; Cho, Kwang-hyeon

2017-01-01

Objective Holsteins are known as the world’s highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein) using Korean Holstein data. Methods This study was performed using single nucleotide polymorphism (SNP) chip data (Illumina BovineSNP50 Beadchip) of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP) and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins. PMID:26954162
Computational Prediction of the Global Functional Genomic Landscape: Applications, Methods and Challenges

PubMed Central

Zhou, Weiqiang; Sherwood, Ben; Ji, Hongkai

2017-01-01

Technological advances have led to an explosive growth of high-throughput functional genomic data. Exploiting the correlation among different data types, it is possible to predict one functional genomic data type from other data types. Prediction tools are valuable in understanding the relationship among different functional genomic signals. They also provide a cost-efficient solution to inferring the unknown functional genomic profiles when experimental data are unavailable due to resource or technological constraints. The predicted data may be used for generating hypotheses, prioritizing targets, interpreting disease variants, facilitating data integration, quality control, and many other purposes. This article reviews various applications of prediction methods in functional genomics, discusses analytical challenges, and highlights some common and effective strategies used to develop prediction methods for functional genomic data. PMID:28076869
GRIL: genome rearrangement and inversion locator.

PubMed

Darling, Aaron E; Mau, Bob; Blattner, Frederick R; Perna, Nicole T

2004-01-01

GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified criteria. Finally, the remaining regions of sequence identity are used to define significant collinear regions among the sequences. By locating collinear regions of sequence, GRIL provides a basis for multiple genome alignment using current alignment systems. GRIL also provides a basis for using current inversion distance tools to infer phylogeny. GRIL is implemented in C++ and runs on any x86-based Linux or Windows platform. It is available from http://asap.ahabs.wisc.edu/gril
Relationships between coastal bacterioplankton growth rates and biomass production: comparison of leucine and thymidine uptake with single-cell physiological characteristics.

PubMed

Franco-Vidal, Leticia; Morán, Xosé Anxelu G

2011-02-01

Specific growth rates of heterotrophic bacterioplankton have been frequently estimated from in situ bacterial production (BP) to biomass (BB) ratios, using a series of assumptions that may result in serious discrepancies with values obtained from predator-free cultures. Here, we used both types of approaches together with a comprehensive assessment of single-cell physiological characteristics (membrane integrity, nucleic acid content, and active respiration) of coastal bacterioplankton during a complete annual cycle (February 2007-January 2008) in the southern Bay of Biscay off Xixón, Spain. Both leucine and thymidine incorporation rates were used in conjunction with empirical tracer to carbon or cells conversion factors (eCFs) to accurately derive BP. Leu and TdR incorporation rates covaried year-round, as did the corresponding eCFs at 0 and 50 m depth. eCFs peaked in autumn, with mean annual values close to the theoretical ones (3.4 kg C mol Leu(-1) and 2.0 × 10(18) cells mol TdR(-1)). Bacterial abundance (0.2-1.5 × 10(6) cells L(-1)) showed a bimodal distribution with maxima in May and October and minima in March. Live (membrane-intact) cells dominated year-round (79-97%), with high nucleic acid cells (42-88%) and actively respiring bacteria (CTC+, 1-16%) showing distinct surface maxima in April and July, respectively. BB (557-1,558 mg C m(-2)) and BP (7-139 mg C m(-2) day(-1)) presented two distinct peaks in spring and autumn, both of similar size due to a strong upwelling event observed in September. Specific growth rates (0.35-3.8 day(-1)) were one order of magnitude higher in predator-free incubations than bacterial turnover rates derived from integrated BP:BB ratios (0.01-0.16 and 0.01-0.09 day(-1), for Leu and TdR, respectively) and were not correlated, probably due to a significant contribution of low activity cells to total standing stocks. The Leu:TdR molar ratio averaged for the water column (6.6-25.5) decreased significantly with higher integrated

Development of genome- and transcriptome-derived microsatellites in related species of snapping shrimps with highly duplicated genomes.

PubMed

Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R

2017-11-01

Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.
Plant functional genomics

NASA Astrophysics Data System (ADS)

Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

2002-04-01

Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.
Privacy-preserving genomic testing in the clinic: a model using HIV treatment.

PubMed

McLaren, Paul J; Raisaro, Jean Louis; Aouri, Manel; Rotger, Margalida; Ayday, Erman; Bartha, István; Delgado, Maria B; Vallet, Yannick; Günthard, Huldrych F; Cavassini, Matthias; Furrer, Hansjakob; Doco-Lecompte, Thanh; Marzolini, Catia; Schmid, Patrick; Di Benedetto, Caroline; Decosterd, Laurent A; Fellay, Jacques; Hubaux, Jean-Pierre; Telenti, Amalio

2016-08-01

The implementation of genomic-based medicine is hindered by unresolved questions regarding data privacy and delivery of interpreted results to health-care practitioners. We used DNA-based prediction of HIV-related outcomes as a model to explore critical issues in clinical genomics. We genotyped 4,149 markers in HIV-positive individuals. Variants allowed for prediction of 17 traits relevant to HIV medical care, inference of patient ancestry, and imputation of human leukocyte antigen (HLA) types. Genetic data were processed under a privacy-preserving framework using homomorphic encryption, and clinical reports describing potentially actionable results were delivered to health-care providers. A total of 230 patients were included in the study. We demonstrated the feasibility of encrypting a large number of genetic markers, inferring patient ancestry, computing monogenic and polygenic trait risks, and reporting results under privacy-preserving conditions. The average execution time of a multimarker test on encrypted data was 865 ms on a standard computer. The proportion of tests returning potentially actionable genetic results ranged from 0 to 54%. The model of implementation presented herein informs on strategies to deliver genomic test results for clinical care. Data encryption to ensure privacy helps to build patient trust, a key requirement on the road to genomic-based medicine.Genet Med 18 8, 814-822.
Genome Evolution in the Obligate but Environmentally Active Luminous Symbionts of Flashlight Fish

PubMed Central

Hendry, Tory A.; de Wet, Jeffrey R.; Dougan, Katherine E.; Dunlap, Paul V.

2016-01-01

The luminous bacterial symbionts of anomalopid flashlight fish are thought to be obligately dependent on their hosts for growth and share several aspects of genome evolution with unrelated obligate symbionts, including genome reduction. However, in contrast to most obligate bacteria, anomalopid symbionts have an active environmental phase that may be important for symbiont transmission. Here we investigated patterns of evolution between anomalopid symbionts compared with patterns in free-living relatives and unrelated obligate symbionts to determine if trends common to obligate symbionts are also found in anomalopid symbionts. Two symbionts, “Candidatus Photodesmus katoptron” and “Candidatus Photodesmus blepharus,” have genomes that are highly similar in gene content and order, suggesting genome stasis similar to ancient obligate symbionts present in insect lineages. This genome stasis exists in spite of the symbiont’s inferred ability to recombine, which is frequently lacking in obligate symbionts with stable genomes. Additionally, we used genome comparisons and tests of selection to infer which genes may be particularly important for the symbiont’s ecology compared with relatives. In keeping with obligate dependence, substitution patterns suggest that most symbiont genes are experiencing relaxed purifying selection compared with relatives. However, genes involved in motility and carbon storage, which are likely to be used outside the host, appear to be under increased purifying selection. Two chemoreceptor chemotaxis genes are retained by both species and show high conservation with amino acid sensing genes, suggesting that the bacteria may actively seek out hosts using chemotaxis toward amino acids, which the symbionts are not able to synthesize. PMID:27389687
Comparative genomics of the bacterial genus Streptococcus illuminates evolutionary implications of species groups.

PubMed

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into "species groups". However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups.
Comparative Genomics of the Bacterial Genus Streptococcus Illuminates Evolutionary Implications of Species Groups

PubMed Central

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into “species groups”. However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups. PMID:24977706
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

PubMed

Liang, Ying; Liao, Bo; Zhu, Wen

2017-01-01

Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Impact of the choice of reference genome on the ability of the core genome SNV methodology to distinguish strains of Salmonella enterica serovar Heidelberg.

PubMed

Usongo, Valentine; Berry, Chrystal; Yousfi, Khadidja; Doualla-Bell, Florence; Labbé, Genevieve; Johnson, Roger; Fournier, Eric; Nadon, Celine; Goodridge, Lawrence; Bekal, Sadjia

2018-01-01

Salmonella enterica serovar Heidelberg (S. Heidelberg) is one of the top serovars causing human salmonellosis. The core genome single nucleotide variant pipeline (cgSNV) is one of several whole genome based sequence typing methods used for the laboratory investigation of foodborne pathogens. SNV detection using this method requires a reference genome. The purpose of this study was to investigate the impact of the choice of the reference genome on the cgSNV-informed phylogenetic clustering and inferred isolate relationships. We found that using a draft or closed genome of S. Heidelberg as reference did not impact the ability of the cgSNV methodology to differentiate among 145 S. Heidelberg isolates involved in foodborne outbreaks. We also found that using a distantly related genome such as S. Dublin as choice of reference led to a loss in resolution since some sporadic isolates were found to cluster together with outbreak isolates. In addition, the genetic distances between outbreak isolates as well as between outbreak and sporadic isolates were overall reduced when S. Dublin was used as the reference genome as opposed to S. Heidelberg.
Similar Ratios of Introns to Intergenic Sequence across Animal Genomes.

PubMed

Francis, Warren R; Wörheide, Gert

2017-06-01

One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The complete mitochondrial genome of rabbit pinworm Passalurus ambiguus: genome characterization and phylogenetic analysis.

PubMed

Liu, Guo-Hua; Li, Sheng; Zou, Feng-Cai; Wang, Chun-Ren; Zhu, Xing-Quan

2016-01-01

Passalurus ambiguus (Nematda: Oxyuridae) is a common pinworm which parasitizes in the caecum and colon of rabbits. Despite its significance as a pathogen, the epidemiology, genetics, systematics, and biology of this pinworm remain poorly understood. In the present study, we sequenced the complete mitochondrial (mt) genome of P. ambiguus. The circular mt genome is 14,023 bp in size and encodes of 36 genes, including 12 protein-coding, two ribosomal RNA, and 22 transfer RNA genes. The mt gene order of P. ambiguus is the same as that of Wellcomia siamensis, but distinct from that of Enterobius vermicularis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes by Bayesian inference (BI) showed that P. ambiguus was more closely related to W. siamensis than to E. vermicularis. This mt genome provides novel genetic markers for studying the molecular epidemiology, population genetics, systematics of pinworm of animals and humans, and should have implications for the diagnosis, prevention, and control of passaluriasis in rabbits and other animals.
The Capsaspora genome reveals a complex unicellular prehistory of animals.

PubMed

Suga, Hiroshi; Chen, Zehua; de Mendoza, Alex; Sebé-Pedrós, Arnau; Brown, Matthew W; Kramer, Eric; Carr, Martin; Kerner, Pierre; Vervoort, Michel; Sánchez-Pons, Núria; Torruella, Guifré; Derelle, Romain; Manning, Gerard; Lang, B Franz; Russ, Carsten; Haas, Brian J; Roger, Andrew J; Nusbaum, Chad; Ruiz-Trillo, Iñaki

2013-01-01

To reconstruct the evolutionary origin of multicellular animals from their unicellular ancestors, the genome sequences of diverse unicellular relatives are essential. However, only the genome of the choanoflagellate Monosiga brevicollis has been reported to date. Here we completely sequence the genome of the filasterean Capsaspora owczarzaki, the closest known unicellular relative of metazoans besides choanoflagellates. Analyses of this genome alter our understanding of the molecular complexity of metazoans' unicellular ancestors showing that they had a richer repertoire of proteins involved in cell adhesion and transcriptional regulation than previously inferred only with the choanoflagellate genome. Some of these proteins were secondarily lost in choanoflagellates. In contrast, most intercellular signalling systems controlling development evolved later concomitant with the emergence of the first metazoans. We propose that the acquisition of these metazoan-specific developmental systems and the co-option of pre-existing genes drove the evolutionary transition from unicellular protists to metazoans.
Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

PubMed

Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

2012-01-01

Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast
Rapid Sequencing of the Bamboo Mitochondrial Genome Using Illumina Technology and Parallel Episodic Evolution of Organelle Genomes in Grasses

PubMed Central

Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

2012-01-01

Background Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. Methodology/Principal Findings We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Conclusions/Significance Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing
Network inference from multimodal data: A review of approaches from infectious disease transmission.

PubMed

Ray, Bisakha; Ghedin, Elodie; Chunara, Rumi

2016-12-01

Networks inference problems are commonly found in multiple biomedical subfields such as genomics, metagenomics, neuroscience, and epidemiology. Networks are useful for representing a wide range of complex interactions ranging from those between molecular biomarkers, neurons, and microbial communities, to those found in human or animal populations. Recent technological advances have resulted in an increasing amount of healthcare data in multiple modalities, increasing the preponderance of network inference problems. Multi-domain data can now be used to improve the robustness and reliability of recovered networks from unimodal data. For infectious diseases in particular, there is a body of knowledge that has been focused on combining multiple pieces of linked information. Combining or analyzing disparate modalities in concert has demonstrated greater insight into disease transmission than could be obtained from any single modality in isolation. This has been particularly helpful in understanding incidence and transmission at early stages of infections that have pandemic potential. Novel pieces of linked information in the form of spatial, temporal, and other covariates including high-throughput sequence data, clinical visits, social network information, pharmaceutical prescriptions, and clinical symptoms (reported as free-text data) also encourage further investigation of these methods. The purpose of this review is to provide an in-depth analysis of multimodal infectious disease transmission network inference methods with a specific focus on Bayesian inference. We focus on analytical Bayesian inference-based methods as this enables recovering multiple parameters simultaneously, for example, not just the disease transmission network, but also parameters of epidemic dynamics. Our review studies their assumptions, key inference parameters and limitations, and ultimately provides insights about improving future network inference methods in multiple applications
KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

PubMed

Wang, Dapeng; Xu, Jiayue; Yu, Jun

2015-09-16

The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.
Reconstructing relative genome size of vascular plants through geological time.

PubMed

Lomax, Barry H; Hilton, Jason; Bateman, Richard M; Upchurch, Garland R; Lake, Janice A; Leitch, Ilia J; Cromwell, Avery; Knight, Charles A

2014-01-01

The strong positive relationship evident between cell and genome size in both animals and plants forms the basis of using the size of stomatal guard cells as a proxy to track changes in plant genome size through geological time. We report for the first time a taxonomic fine-scale investigation into changes in stomatal guard-cell length and use these data to infer changes in genome size through the evolutionary history of land plants. Our data suggest that many of the earliest land plants had exceptionally large genome sizes and that a predicted overall trend of increasing genome size within individual lineages through geological time is not supported. However, maximum genome size steadily increases from the Mississippian (c. 360 million yr ago (Ma)) to the present. We hypothesise that the functional relationship between stomatal size, genome size and atmospheric CO2 may contribute to the dichotomy reported between preferential extinction of neopolyploids and the prevalence of palaeopolyploidy observed in DNA sequence data of extant vascular plants. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Permanent draft genomes of the three Rhodopirellula baltica strains SH28, SWK14 and WH47.

PubMed

Richter, Michael; Richter-Heitmann, Tim; Klindworth, Anna; Wegner, Carl-Eric; Frank, Carsten S; Harder, Jens; Glöckner, Frank Oliver

2014-02-01

The genomes of three Rhodopirellula baltica strains were sequenced as permanent drafts to complement the full genome sequence of the type strain R. baltica SH1(T). The isolates are part of a larger study to infer the biogeography of Rhodopirellula species in European marine waters, as well as to amend the genus description of R. baltica. This genomics resource article is the first of a series of five publications reporting in total eight new permanent daft genomes of Rhodopirellula species. Copyright © 2013 Elsevier B.V. All rights reserved.
Inference or Observation?

ERIC Educational Resources Information Center

Finson, Kevin D.

2010-01-01

Learning about what inferences are, and what a good inference is, will help students become more scientifically literate and better understand the nature of science in inquiry. Students in K-4 should be able to give explanations about what they investigate (NSTA 1997) and that includes doing so through inferring. This article provides some tips…
Model-based analyses of whole-genome data reveal a complex evolutionary history involving archaic introgression in Central African Pygmies.

PubMed

Hsieh, PingHsun; Woerner, August E; Wall, Jeffrey D; Lachance, Joseph; Tishkoff, Sarah A; Gutenkunst, Ryan N; Hammer, Michael F

2016-03-01

Comparisons of whole-genome sequences from ancient and contemporary samples have pointed to several instances of archaic admixture through interbreeding between the ancestors of modern non-Africans and now extinct hominids such as Neanderthals and Denisovans. One implication of these findings is that some adaptive features in contemporary humans may have entered the population via gene flow with archaic forms in Eurasia. Within Africa, fossil evidence suggests that anatomically modern humans (AMH) and various archaic forms coexisted for much of the last 200,000 yr; however, the absence of ancient DNA in Africa has limited our ability to make a direct comparison between archaic and modern human genomes. Here, we use statistical inference based on high coverage whole-genome data (greater than 60×) from contemporary African Pygmy hunter-gatherers as an alternative means to study the evolutionary history of the genus Homo. Using whole-genome simulations that consider demographic histories that include both isolation and gene flow with neighboring farming populations, our inference method rejects the hypothesis that the ancestors of AMH were genetically isolated in Africa, thus providing the first whole genome-level evidence of African archaic admixture. Our inferences also suggest a complex human evolutionary history in Africa, which involves at least a single admixture event from an unknown archaic population into the ancestors of AMH, likely within the last 30,000 yr. © 2016 Hsieh et al.; Published by Cold Spring Harbor Laboratory Press.
A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication

PubMed Central

2014-01-01

Background Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. Results We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. Conclusions The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel. PMID:24669946

A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication.

PubMed

Kai, Wataru; Nomura, Kazuharu; Fujiwara, Atushi; Nakamura, Yoji; Yasuike, Motoshige; Ojima, Nobuhiko; Masaoka, Tetsuji; Ozaki, Akiyuki; Kazeto, Yukinori; Gen, Koichiro; Nagao, Jiro; Tanaka, Hideki; Kobayashi, Takanori; Ototake, Mitsuru

2014-03-26

Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Spatiotemporal genomic architecture informs precision oncology in glioblastoma

PubMed Central

Lee, Jin-Ku; Wang, Jiguang; Sa, Jason K.; Ladewig, Erik; Lee, Hae-Ock; Lee, In-Hee; Kang, Hyun Ju; Rosenbloom, Daniel S.; Camara, Pablo G.; Liu, Zhaoqi; van Nieuwenhuizen, Patrick; Jung, Sang Won; Choi, Seung Won; Kim, Junhyung; Chen, Andrew; Kim, Kyu-Tae; Shin, Sang; Seo, Yun Jee; Oh, Jin-Mi; Shin, Yong Jae; Park, Chul-Kee; Kong, Doo-Sik; Seol, Ho Jun; Blumberg, Andrew; Lee, Jung-Il; Iavarone, Antonio; Park, Woong-Yang; Rabadan, Raul; Nam, Do-Hyun

2017-01-01

Precision medicine in cancer proposes that genomic characterization of tumors can inform personalized targeted therapies1–5. This proposition, however, is complicated by spatial and temporal heterogeneity6–14. Here we study genomic and expression profiles across 127 multi-sector or longitudinal specimens from 52 glioblastoma (GBM) patients. Using bulk and single-cell data, we find that samples from the same tumor mass share genomic and expression signatures, while geographically separated multifocal tumors and/or long-term recurrent tumors are seeded from different clones. Chemical screening of patient-derived glioma cells (PDCs) shows that therapeutic response is associated to genetic similarity, and multifocal tumors enriched with PIK3CA mutations have a heterogeneous drug response pattern. Importantly, we show that targeting truncal events is more efficacious in reducing tumor burden. In summary, this work demonstrates that evolutionary inference from integrated genomic analysis in multi-sector biopsies can inform targeted therapeutic interventions for GBM patients. PMID:28263318
Temporal and vertical distributions of bacterioplankton at the Gray's Reef National Marine Sanctuary.

PubMed

Lu, Xinxin; Sun, Shulei; Zhang, Yu-Qin; Hollibaugh, James T; Mou, Xiaozhen

2015-02-01

Large spatial scales and long-term shifts of bacterial community composition (BCC) in the open ocean can often be reliably predicted based on the dynamics of physical-chemical variables. The power of abiotic factors in shaping BCC on shorter time scales in shallow estuarine mixing zones is less clear. We examined the diurnal variation in BCC at different water depths in the spring and fall of 2011 at a station in the Gray's Reef National Marine Sanctuary (GRNMS). This site is located in the transition zone between the estuarine plume and continental shelf waters of the South Atlantic Bight. A total of 234,516 pyrotag sequences of bacterial 16S rRNA genes were recovered; they were taxonomically affiliated with >200 families of 23 bacterial phyla. Nonmetric multidimensional scaling analysis revealed significant differences in BCC between spring and fall samples, likely due to seasonality in the concentrations of dissolved organic carbon and nitrate plus nitrite. Within each diurnal sampling, BCC differed significantly by depth only in the spring and differed significantly between day and night only in the fall. The former variation largely tracked changes in light availability, while the latter was most correlated with concentrations of polyamines and chlorophyll a. Our results suggest that at the GRNMS, a coastal mixing zone, diurnal variation in BCC is attributable to the mixing of local and imported bacterioplankton rather than to bacterial growth in response to environmental changes. Our results also indicate that, like members of the Roseobacter clade, SAR11 bacteria may play an important role in processing dissolved organic material in coastal oceans. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Entropic Inference

NASA Astrophysics Data System (ADS)

Caticha, Ariel

2011-03-01

In this tutorial we review the essential arguments behing entropic inference. We focus on the epistemological notion of information and its relation to the Bayesian beliefs of rational agents. The problem of updating from a prior to a posterior probability distribution is tackled through an eliminative induction process that singles out the logarithmic relative entropy as the unique tool for inference. The resulting method of Maximum relative Entropy (ME), includes as special cases both MaxEnt and Bayes' rule, and therefore unifies the two themes of these workshops—the Maximum Entropy and the Bayesian methods—into a single general inference scheme.
The Dimensionality of Inference Making: Are Local and Global Inferences Distinguishable?

ERIC Educational Resources Information Center

Muijselaar, Marloes M. L.

2018-01-01

We investigated the dimensionality of inference making in samples of 4- to 9-year-olds (Ns = 416-783) to determine if local and global coherence inferences could be distinguished. In addition, we examined the validity of our experimenter-developed inference measure by comparing with three additional measures of listening comprehension. Multitrait,…
Comparative Genomics of Flatworms (Platyhelminthes) Reveals Shared Genomic Features of Ecto- and Endoparastic Neodermata

PubMed Central

Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

2014-01-01

The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host–parasite interactions and speciation in the highly diverse monogenean flatworms. PMID:24732282
Genome-wide association analysis based on multiple imputation with low-depth GBS data: application to biofuel traits in reed canarygrass

USDA-ARS?s Scientific Manuscript database

Genotyping-by-sequencing allows for large-scale genetic analyses in plant species with no reference genome, creating the challenge of sound inference in the presence of uncertain genotypes. Here we report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundina...
Genome-wide association study based on multiple imputation with low-depth sequencing data: application to biofuel traits in reed canarygrass

USDA-ARS?s Scientific Manuscript database

Genotyping by sequencing allows for large-scale genetic analyses in plant species with no reference genome, but sets the challenge of sound inference in presence of uncertain genotypes. We report an imputation-based genome-wide association study (GWAS) in reed canarygrass (Phalaris arundinacea L., P...
Prevalence of the Chloroflexi-Related SAR202 Bacterioplankton Cluster throughout the Mesopelagic Zone and Deep Ocean†

PubMed Central

Morris, R. M.; Rappé, M. S.; Urbach, E.; Connon, S. A.; Giovannoni, S. J.

2004-01-01

Since their initial discovery in samples from the north Atlantic Ocean, 16S rRNA genes related to the environmental gene clone cluster known as SAR202 have been recovered from pelagic freshwater, marine sediment, soil, and deep subsurface terrestrial environments. Together, these clones form a major, monophyletic subgroup of the phylum Chloroflexi. While members of this diverse group are consistently identified in the marine environment, there are currently no cultured representatives, and very little is known about their distribution or abundance in the world's oceans. In this study, published and newly identified SAR202-related 16S rRNA gene sequences were used to further resolve the phylogeny of this cluster and to design taxon-specific oligonucleotide probes for fluorescence in situ hybridization. Direct cell counts from the Bermuda Atlantic time series study site in the north Atlantic Ocean, the Hawaii ocean time series site in the central Pacific Ocean, and along the Newport hydroline in eastern Pacific coastal waters showed that SAR202 cluster cells were most abundant below the deep chlorophyll maximum and that they persisted to 3,600 m in the Atlantic Ocean and to 4,000 m in the Pacific Ocean, the deepest samples used in this study. On average, members of the SAR202 group accounted for 10.2% (±5.7%) of all DNA-containing bacterioplankton between 500 and 4,000 m. PMID:15128540
The Inference of Gene Trees with Species Trees

PubMed Central

Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

2015-01-01

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970
Extreme Recombination Frequencies Shape Genome Variation and Evolution in the Honeybee, Apis mellifera

PubMed Central

Wallberg, Andreas; Glémin, Sylvain; Webster, Matthew T.

2015-01-01

Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. PMID:25902173
Genetic Competence Drives Genome Diversity in Bacillus subtilis

PubMed Central

Chevreux, Bastien; Serra, Cláudia R; Schyns, Ghislain; Henriques, Adriano O

2018-01-01

Abstract Prokaryote genomes are the result of a dynamic flux of genes, with increases achieved via horizontal gene transfer and reductions occurring through gene loss. The ecological and selective forces that drive this genomic flexibility vary across species. Bacillus subtilis is a naturally competent bacterium that occupies various environments, including plant-associated, soil, and marine niches, and the gut of both invertebrates and vertebrates. Here, we quantify the genomic diversity of B. subtilis and infer the genome dynamics that explain the high genetic and phenotypic diversity observed. Phylogenomic and comparative genomic analyses of 42 B. subtilis genomes uncover a remarkable genome diversity that translates into a core genome of 1,659 genes and an asymptotic pangenome growth rate of 57 new genes per new genome added. This diversity is due to a large proportion of low-frequency genes that are acquired from closely related species. We find no gene-loss bias among wild isolates, which explains why the cloud genome, 43% of the species pangenome, represents only a small proportion of each genome. We show that B. subtilis can acquire xenologous copies of core genes that propagate laterally among strains within a niche. While not excluding the contributions of other mechanisms, our results strongly suggest a process of gene acquisition that is largely driven by competence, where the long-term maintenance of acquired genes depends on local and global fitness effects. This competence-driven genomic diversity provides B. subtilis with its generalist character, enabling it to occupy a wide range of ecological niches and cycle through them. PMID:29272410
Genome size expansion and the relationship between nuclear DNA content and spore size in the Asplenium monanthes fern complex (Aspleniaceae)

PubMed Central

2013-01-01

Background Homosporous ferns are distinctive amongst the land plant lineages for their high chromosome numbers and enigmatic genomes. Genome size measurements are an under exploited tool in homosporous ferns and show great potential to provide an overview of the mechanisms that define genome evolution in these ferns. The aim of this study is to investigate the evolution of genome size and the relationship between genome size and spore size within the apomictic Asplenium monanthes fern complex and related lineages. Results Comparative analyses to test for a relationship between spore size and genome size show that they are not correlated. The data do however provide evidence for marked genome size variation between species in this group. These results indicate that Asplenium monanthes has undergone a two-fold expansion in genome size. Conclusions Our findings challenge the widely held assumption that spore size can be used to infer ploidy levels within apomictic fern complexes. We argue that the observed genome size variation is likely to have arisen via increases in both chromosome number due to polyploidy and chromosome size due to amplification of repetitive DNA (e.g. transposable elements, especially retrotransposons). However, to date the latter has not been considered to be an important process of genome evolution within homosporous ferns. We infer that genome evolution, at least in some homosporous fern lineages, is a more dynamic process than existing studies would suggest. PMID:24354467
Phylogenic inference using alignment-free methods for applications in microbial community surveys using 16s rRNA gene

PubMed Central

2017-01-01

The diversity of microbiota is best explored by understanding the phylogenetic structure of the microbial communities. Traditionally, sequence alignment has been used for phylogenetic inference. However, alignment-based approaches come with significant challenges and limitations when massive amounts of data are analyzed. In the recent decade, alignment-free approaches have enabled genome-scale phylogenetic inference. Here we evaluate three alignment-free methods: ACS, CVTree, and Kr for phylogenetic inference with 16s rRNA gene data. We use a taxonomic gold standard to compare the accuracy of alignment-free phylogenetic inference with that of common microbiome-wide phylogenetic inference pipelines based on PyNAST and MUSCLE alignments with FastTree and RAxML. We re-simulate fecal communities from Human Microbiome Project data to evaluate the performance of the methods on datasets with properties of real data. Our comparisons show that alignment-free methods are not inferior to alignment-based methods in giving accurate and robust phylogenic trees. Moreover, consensus ensembles of alignment-free phylogenies are superior to those built from alignment-based methods in their ability to highlight community differences in low power settings. In addition, the overall running times of alignment-based and alignment-free phylogenetic inference are comparable. Taken together our empirical results suggest that alignment-free methods provide a viable approach for microbiome-wide phylogenetic inference. PMID:29136663
Evolutionary genomics of animal personality.

PubMed

van Oers, Kees; Mueller, Jakob C

2010-12-27

Research on animal personality can be approached from both a phenotypic and a genetic perspective. While using a phenotypic approach one can measure present selection on personality traits and their combinations. However, this approach cannot reconstruct the historical trajectory that was taken by evolution. Therefore, it is essential for our understanding of the causes and consequences of personality diversity to link phenotypic variation in personality traits with polymorphisms in genomic regions that code for this trait variation. Identifying genes or genome regions that underlie personality traits will open exciting possibilities to study natural selection at the molecular level, gene-gene and gene-environment interactions, pleiotropic effects and how gene expression shapes personality phenotypes. In this paper, we will discuss how genome information revealed by already established approaches and some more recent techniques such as high-throughput sequencing of genomic regions in a large number of individuals can be used to infer micro-evolutionary processes, historical selection and finally the maintenance of personality trait variation. We will do this by reviewing recent advances in molecular genetics of animal personality, but will also use advanced human personality studies as case studies of how molecular information may be used in animal personality research in the near future.
Self-enforcing Private Inference Control

NASA Astrophysics Data System (ADS)

Yang, Yanjiang; Li, Yingjiu; Weng, Jian; Zhou, Jianying; Bao, Feng

Private inference control enables simultaneous enforcement of inference control and protection of users' query privacy. Private inference control is a useful tool for database applications, especially when users are increasingly concerned about individual privacy nowadays. However, protection of query privacy on top of inference control is a double-edged sword: without letting the database server know the content of user queries, users can easily launch DoS attacks. To assuage DoS attacks in private inference control, we propose the concept of self-enforcing private inference control, whose intuition is to force users to only make inference-free queries by enforcing inference control themselves; otherwise, penalty will inflict upon the violating users.
Signatures of natural selection and ecological differentiation in microbial genomes.

PubMed

Shapiro, B Jesse

2014-01-01

We live in a microbial world. Most of the genetic and metabolic diversity that exists on earth - and has existed for billions of years - is microbial. Making sense of this vast diversity is a daunting task, but one that can be approached systematically by analyzing microbial genome sequences. This chapter explores how the evolutionary forces of recombination and selection act to shape microbial genome sequences, leaving signatures that can be detected using comparative genomics and population-genetic tests for selection. I describe the major classes of tests, paying special attention to their relative strengths and weaknesses when applied to microbes. Specifically, I apply a suite of tests for selection to a set of closely-related bacterial genomes with different microhabitat preferences within the marine water column, shedding light on the genomic mechanisms of ecological differentiation in the wild. I will focus on the joint problem of simultaneously inferring the boundaries between microbial populations, and the selective forces operating within and between populations.
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs

PubMed Central

LeGault, Laura H.; Dewey, Colin N.

2013-01-01

Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23846746

Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

PubMed

Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

2016-01-01

Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. Copyright © 2016. Published by Elsevier SAS.
Evolution of the mitochondrial genome in snakes: Gene rearrangements and phylogenetic relationships

PubMed Central

Yan, Jie; Li, Hongdan; Zhou, Kaiya

2008-01-01

Background Snakes as a major reptile group display a variety of morphological characteristics pertaining to their diverse behaviours. Despite abundant analyses of morphological characters, molecular studies using mitochondrial and nuclear genes are limited. As a result, the phylogeny of snakes remains controversial. Previous studies on mitochondrial genomes of snakes have demonstrated duplication of the control region and translocation of trnL to be two notable features of the alethinophidian (all serpents except blindsnakes and threadsnakes) mtDNAs. Our purpose is to further investigate the gene organizations, evolution of the snake mitochondrial genome, and phylogenetic relationships among several major snake families. Results The mitochondrial genomes were sequenced for four taxa representing four different families, and each had a different gene arrangement. Comparative analyses with other snake mitochondrial genomes allowed us to summarize six types of mitochondrial gene arrangement in snakes. Phylogenetic reconstruction with commonly used methods of phylogenetic inference (BI, ML, MP, NJ) arrived at a similar topology, which was used to reconstruct the evolution of mitochondrial gene arrangements in snakes. Conclusion The phylogenetic relationships among the major families of snakes are in accordance with the mitochondrial genomes in terms of gene arrangements. The gene arrangement in Ramphotyphlops braminus mtDNA is inferred to be ancestral for snakes. After the divergence of the early Ramphotyphlops lineage, three types of rearrangements occurred. These changes involve translocations within the IQM tRNA gene cluster and the duplication of the CR. All phylogenetic methods support the placement of Enhydris plumbea outside of the (Colubridae + Elapidae) cluster, providing mitochondrial genomic evidence for the familial rank of Homalopsidae. PMID:19038056
More than one kind of inference: re-examining what's learned in feature inference and classification.

PubMed

Sweller, Naomi; Hayes, Brett K

2010-08-01

Three studies examined how task demands that impact on attention to typical or atypical category features shape the category representations formed through classification learning and inference learning. During training categories were learned via exemplar classification or by inferring missing exemplar features. In the latter condition inferences were made about missing typical features alone (typical feature inference) or about both missing typical and atypical features (mixed feature inference). Classification and mixed feature inference led to the incorporation of typical and atypical features into category representations, with both kinds of features influencing inferences about familiar (Experiments 1 and 2) and novel (Experiment 3) test items. Those in the typical inference condition focused primarily on typical features. Together with formal modelling, these results challenge previous accounts that have characterized inference learning as producing a focus on typical category features. The results show that two different kinds of inference learning are possible and that these are subserved by different kinds of category representations.
Perceptual inference.

PubMed

Aggelopoulos, Nikolaos C

2015-08-01

Perceptual inference refers to the ability to infer sensory stimuli from predictions that result from internal neural representations built through prior experience. Methods of Bayesian statistical inference and decision theory model cognition adequately by using error sensing either in guiding action or in "generative" models that predict the sensory information. In this framework, perception can be seen as a process qualitatively distinct from sensation, a process of information evaluation using previously acquired and stored representations (memories) that is guided by sensory feedback. The stored representations can be utilised as internal models of sensory stimuli enabling long term associations, for example in operant conditioning. Evidence for perceptual inference is contributed by such phenomena as the cortical co-localisation of object perception with object memory, the response invariance in the responses of some neurons to variations in the stimulus, as well as from situations in which perception can be dissociated from sensation. In the context of perceptual inference, sensory areas of the cerebral cortex that have been facilitated by a priming signal may be regarded as comparators in a closed feedback loop, similar to the better known motor reflexes in the sensorimotor system. The adult cerebral cortex can be regarded as similar to a servomechanism, in using sensory feedback to correct internal models, producing predictions of the outside world on the basis of past experience. Copyright © 2015 Elsevier Ltd. All rights reserved.
Genomic Quantitative Genetics to Study Evolution in the Wild.

PubMed

Gienapp, Phillip; Fior, Simone; Guillaume, Frédéric; Lasky, Jesse R; Sork, Victoria L; Csilléry, Katalin

2017-12-01

Quantitative genetic theory provides a means of estimating the evolutionary potential of natural populations. However, this approach was previously only feasible in systems where the genetic relatedness between individuals could be inferred from pedigrees or experimental crosses. The genomic revolution opened up the possibility of obtaining the realized proportion of genome shared among individuals in natural populations of virtually any species, which could promise (more) accurate estimates of quantitative genetic parameters in virtually any species. Such a 'genomic' quantitative genetics approach relies on fewer assumptions, offers a greater methodological flexibility, and is thus expected to greatly enhance our understanding of evolution in natural populations, for example, in the context of adaptation to environmental change, eco-evolutionary dynamics, and biodiversity conservation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

PubMed

Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

2016-01-01

3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.
Genome alignment with graph data structures: a comparison

PubMed Central

2014-01-01

Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884
Species tree inference by minimizing deep coalescences.

PubMed

Than, Cuong; Nakhleh, Luay

2009-09-01

In a 1997 seminal paper, W. Maddison proposed minimizing deep coalescences, or MDC, as an optimization criterion for inferring the species tree from a set of incongruent gene trees, assuming the incongruence is exclusively due to lineage sorting. In a subsequent paper, Maddison and Knowles provided and implemented a search heuristic for optimizing the MDC criterion, given a set of gene trees. However, the heuristic is not guaranteed to compute optimal solutions, and its hill-climbing search makes it slow in practice. In this paper, we provide two exact solutions to the problem of inferring the species tree from a set of gene trees under the MDC criterion. In other words, our solutions are guaranteed to find the tree that minimizes the total number of deep coalescences from a set of gene trees. One solution is based on a novel integer linear programming (ILP) formulation, and another is based on a simple dynamic programming (DP) approach. Powerful ILP solvers, such as CPLEX, make the first solution appealing, particularly for very large-scale instances of the problem, whereas the DP-based solution eliminates dependence on proprietary tools, and its simplicity makes it easy to integrate with other genomic events that may cause gene tree incongruence. Using the exact solutions, we analyze a data set of 106 loci from eight yeast species, a data set of 268 loci from eight Apicomplexan species, and several simulated data sets. We show that the MDC criterion provides very accurate estimates of the species tree topologies, and that our solutions are very fast, thus allowing for the accurate analysis of genome-scale data sets. Further, the efficiency of the solutions allow for quick exploration of sub-optimal solutions, which is important for a parsimony-based criterion such as MDC, as we show. We show that searching for the species tree in the compatibility graph of the clusters induced by the gene trees may be sufficient in practice, a finding that helps ameliorate the
The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations.

PubMed

Lee, HoJoon; Palm, Jennifer; Grimes, Susan M; Ji, Hanlee P

2015-10-27

include clinical stage or smoking history. The Cancer Genome Atlas Clinical Explorer enables the cancer research community and others to explore clinically relevant associations inferred from TCGA data. With its accessible web and mobile interface, users can examine queries and test hypothesis regarding genomic/proteomic alterations across a broad spectrum of malignancies.
A draft annotation and overview of the human genome

PubMed Central

Wright, Fred A; Lemon, William J; Zhao, Wei D; Sears, Russell; Zhuo, Degen; Wang, Jian-Ping; Yang, Hee-Yung; Baer, Troy; Stredney, Don; Spitzner, Joe; Stutz, Al; Krahe, Ralf; Yuan, Bo

2001-01-01

Background The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. Results We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. Conclusions We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence. PMID:11516338
Phylogenomics reveals an extensive history of genome duplication in diatoms (Bacillariophyta).

PubMed

Parks, Matthew B; Nakov, Teofil; Ruck, Elizabeth C; Wickett, Norman J; Alverson, Andrew J

2018-03-01

Diatoms are one of the most species-rich lineages of microbial eukaryotes. Similarities in clade age, species richness, and primary productivity motivate comparisons to angiosperms, whose genomes have been inordinately shaped by whole-genome duplication (WGD). WGDs have been linked to speciation, increased rates of lineage diversification, and identified as a principal driver of angiosperm evolution. We synthesized a large but scattered body of evidence that suggests polyploidy may be common in diatoms as well. We used gene counts, gene trees, and distributions of synonymous divergence to carry out a phylogenomic analysis of WGD across a diverse set of 37 diatom species. Several methods identified WGDs of varying age across diatoms. Determining the occurrence, exact number, and placement of events was greatly impacted by uncertainty in gene trees. WGDs inferred from synonymous divergence of paralogs varied depending on how redundancy in transcriptomes was assessed, gene families were assembled, and synonymous distances (Ks) were calculated. Our results highlighted a need for systematic evaluation of key methodological aspects of Ks-based approaches to WGD inference. Gene tree reconciliations supported allopolyploidy as the predominant mode of polyploid formation, with strong evidence for ancient allopolyploid events in the thalassiosiroid and pennate diatom clades. Our results suggest that WGD has played a major role in the evolution of diatom genomes. We outline challenges in reconstructing paleopolyploid events in diatoms that, together with these results, offer a framework for understanding the impact of genome duplication in a group that likely harbors substantial genomic diversity. © 2018 The Authors. American Journal of Botany is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America.
Inferring responses to climate dynamics from historical demography in neotropical forest lizards

PubMed Central

Xue, Alexander T.; Brown, Jason L.; Alvarado-Serrano, Diego F.; Rodrigues, Miguel T.; Hickerson, Michael J.; Carnaval, Ana C.

2016-01-01

We apply a comparative framework to test for concerted demographic changes in response to climate shifts in the neotropical lowland forests, learning from the past to inform projections of the future. Using reduced genomic (SNP) data from three lizard species codistributed in Amazonia and the Atlantic Forest (Anolis punctatus, Anolis ortonii, and Polychrus marmoratus), we first reconstruct former population history and test for assemblage-level responses to cycles of moisture transport recently implicated in changes of forest distribution during the Late Quaternary. We find support for population shifts within the time frame of inferred precipitation fluctuations (the last 250,000 y) but detect idiosyncratic responses across species and uniformity of within-species responses across forest regions. These results are incongruent with expectations of concerted population expansion in response to increased rainfall and fail to detect out-of-phase demographic syndromes (expansions vs. contractions) across forest regions. Using reduced genomic data to infer species-specific demographical parameters, we then model the plausible spatial distribution of genetic diversity in the Atlantic Forest into future climates (2080) under a medium carbon emission trajectory. The models forecast very distinct trajectories for the lizard species, reflecting unique estimated population densities and dispersal abilities. Ecological and demographic constraints seemingly lead to distinct and asynchronous responses to climatic regimes in the tropics, even among similarly distributed taxa. Incorporating such constraints is key to improve modeling of the distribution of biodiversity in the past and future. PMID:27432951
Inferring responses to climate dynamics from historical demography in neotropical forest lizards.

PubMed

Prates, Ivan; Xue, Alexander T; Brown, Jason L; Alvarado-Serrano, Diego F; Rodrigues, Miguel T; Hickerson, Michael J; Carnaval, Ana C

2016-07-19

We apply a comparative framework to test for concerted demographic changes in response to climate shifts in the neotropical lowland forests, learning from the past to inform projections of the future. Using reduced genomic (SNP) data from three lizard species codistributed in Amazonia and the Atlantic Forest (Anolis punctatus, Anolis ortonii, and Polychrus marmoratus), we first reconstruct former population history and test for assemblage-level responses to cycles of moisture transport recently implicated in changes of forest distribution during the Late Quaternary. We find support for population shifts within the time frame of inferred precipitation fluctuations (the last 250,000 y) but detect idiosyncratic responses across species and uniformity of within-species responses across forest regions. These results are incongruent with expectations of concerted population expansion in response to increased rainfall and fail to detect out-of-phase demographic syndromes (expansions vs. contractions) across forest regions. Using reduced genomic data to infer species-specific demographical parameters, we then model the plausible spatial distribution of genetic diversity in the Atlantic Forest into future climates (2080) under a medium carbon emission trajectory. The models forecast very distinct trajectories for the lizard species, reflecting unique estimated population densities and dispersal abilities. Ecological and demographic constraints seemingly lead to distinct and asynchronous responses to climatic regimes in the tropics, even among similarly distributed taxa. Incorporating such constraints is key to improve modeling of the distribution of biodiversity in the past and future.
SuperDCA for genome-wide epistasis analysis.

PubMed

Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A; Bentley, Stephen D; Croucher, Nicholas J; Corander, Jukka

2018-05-29

The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10 4 -10 5 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10 5 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera)

PubMed Central

2011-01-01

Background The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Results Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Conclusions Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the
Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera).

PubMed

Cameron, Stephen L; Yoshizawa, Kazunori; Mizukoshi, Atsushi; Whiting, Michael F; Johnson, Kevin P

2011-08-04

The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the loss of this gene lead to the
Exploring Protein Function Using the Saccharomyces Genome Database.

PubMed

Wong, Edith D

2017-01-01

Elucidating the function of individual proteins will help to create a comprehensive picture of cell biology, as well as shed light on human disease mechanisms, possible treatments, and cures. Due to its compact genome, and extensive history of experimentation and annotation, the budding yeast Saccharomyces cerevisiae is an ideal model organism in which to determine protein function. This information can then be leveraged to infer functions of human homologs. Despite the large amount of research and biological data about S. cerevisiae, many proteins' functions remain unknown. Here, we explore ways to use the Saccharomyces Genome Database (SGD; http://www.yeastgenome.org ) to predict the function of proteins and gain insight into their roles in various cellular processes.
A Hierarchical Framework for State-Space Matrix Inference and Clustering.

PubMed

Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz

2016-09-01

In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its
Arthropod phylogenetics in light of three novel millipede (myriapoda: diplopoda) mitochondrial genomes with comments on the appropriateness of mitochondrial genome sequence data for inferring deep level relationships.

PubMed

Brewer, Michael S; Swafford, Lynn; Spruill, Chad L; Bond, Jason E

2013-01-01

Arthropods are the most diverse group of eukaryotic organisms, but their phylogenetic relationships are poorly understood. Herein, we describe three mitochondrial genomes representing orders of millipedes for which complete genomes had not been characterized. Newly sequenced genomes are combined with existing data to characterize the protein coding regions of myriapods and to attempt to reconstruct the evolutionary relationships within the Myriapoda and Arthropoda. The newly sequenced genomes are similar to previously characterized millipede sequences in terms of synteny and length. Unique translocations occurred within the newly sequenced taxa, including one half of the Appalachioria falcifera genome, which is inverted with respect to other millipede genomes. Across myriapods, amino acid conservation levels are highly dependent on the gene region. Additionally, individual loci varied in the level of amino acid conservation. Overall, most gene regions showed low levels of conservation at many sites. Attempts to reconstruct the evolutionary relationships suffered from questionable relationships and low support values. Analyses of phylogenetic informativeness show the lack of signal deep in the trees (i.e., genes evolve too quickly). As a result, the myriapod tree resembles previously published results but lacks convincing support, and, within the arthropod tree, well established groups were recovered as polyphyletic. The novel genome sequences described herein provide useful genomic information concerning millipede groups that had not been investigated. Taken together with existing sequences, the variety of compositions and evolution of myriapod mitochondrial genomes are shown to be more complex than previously thought. Unfortunately, the use of mitochondrial protein-coding regions in deep arthropod phylogenetics appears problematic, a result consistent with previously published studies. Lack of phylogenetic signal renders the resulting tree topologies as suspect
Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3.

PubMed

Han, Mira V; Thomas, Gregg W C; Lugo-Martinez, Jose; Hahn, Matthew W

2013-08-01

Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.

Optimal inference with suboptimal models: Addiction and active Bayesian inference

PubMed Central

Schwartenbeck, Philipp; FitzGerald, Thomas H.B.; Mathys, Christoph; Dolan, Ray; Wurst, Friedrich; Kronbichler, Martin; Friston, Karl

2015-01-01

When casting behaviour as active (Bayesian) inference, optimal inference is defined with respect to an agent’s beliefs – based on its generative model of the world. This contrasts with normative accounts of choice behaviour, in which optimal actions are considered in relation to the true structure of the environment – as opposed to the agent’s beliefs about worldly states (or the task). This distinction shifts an understanding of suboptimal or pathological behaviour away from aberrant inference as such, to understanding the prior beliefs of a subject that cause them to behave less ‘optimally’ than our prior beliefs suggest they should behave. Put simply, suboptimal or pathological behaviour does not speak against understanding behaviour in terms of (Bayes optimal) inference, but rather calls for a more refined understanding of the subject’s generative model upon which their (optimal) Bayesian inference is based. Here, we discuss this fundamental distinction and its implications for understanding optimality, bounded rationality and pathological (choice) behaviour. We illustrate our argument using addictive choice behaviour in a recently described ‘limited offer’ task. Our simulations of pathological choices and addictive behaviour also generate some clear hypotheses, which we hope to pursue in ongoing empirical work. PMID:25561321
Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes.

PubMed

Pozzi, Luca; Hodgson, Jason A; Burrell, Andrew S; Sterner, Kirstin N; Raaum, Ryan L; Disotell, Todd R

2014-06-01

The origins and the divergence times of the most basal lineages within primates have been difficult to resolve mainly due to the incomplete sampling of early fossil taxa. The main source of contention is related to the discordance between molecular and fossil estimates: while there are no crown primate fossils older than 56Ma, most molecule-based estimates extend the origins of crown primates into the Cretaceous. Here we present a comprehensive mitogenomic study of primates. We assembled 87 mammalian mitochondrial genomes, including 62 primate species representing all the families of the order. We newly sequenced eleven mitochondrial genomes, including eight Old World monkeys and three strepsirrhines. Phylogenetic analyses support a strong topology, confirming the monophyly for all the major primate clades. In contrast to previous mitogenomic studies, the positions of tarsiers and colugos relative to strepsirrhines and anthropoids are well resolved. In order to improve our understanding of how fossil calibrations affect age estimates within primates, we explore the effect of seventeen fossil calibrations across primates and other mammalian groups and we select a subset of calibrations to date our mitogenomic tree. The divergence date estimates of the Strepsirrhine/Haplorhine split support an origin of crown primates in the Late Cretaceous, at around 74Ma. This result supports a short-fuse model of primate origins, whereby relatively little time passed between the origin of the order and the diversification of its major clades. It also suggests that the early primate fossil record is likely poorly sampled. Copyright © 2014 Elsevier Inc. All rights reserved.
A Molecular Phylogeny of Hemiptera Inferred from Mitochondrial Genome Sequences

PubMed Central

Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping

2012-01-01

Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses. PMID:23144967
Composition Influences the Pathway but not the Outcome of the Metabolic Response of Bacterioplankton to Resource Shifts

PubMed Central

Comte, Jérôme; del Giorgio, Paul A.

2011-01-01

Bacterioplankton community metabolism is central to the functioning of aquatic ecosystems, and strongly reactive to changes in the environment, yet the processes underlying this response remain unclear. Here we explore the role that community composition plays in shaping the bacterial metabolic response to resource gradients that occur along aquatic ecotones in a complex watershed in Québec. Our results show that the response is mediated by complex shifts in community structure, and structural equation analysis confirmed two main pathways, one involving adjustments in the level of activity of existing phylotypes, and the other the replacement of the dominant phylotypes. These contrasting response pathways were not determined by the type or the intensity of the gradients involved, as we had hypothesized, but rather it would appear that some compositional configurations may be intrinsically more plastic than others. Our results suggest that community composition determines this overall level of community plasticity, but that composition itself may be driven by factors independent of the environmental gradients themselves, such that the response of bacterial communities to a given type of gradient may alternate between the adjustment and replacement pathways. We conclude that community composition influences the pathways of response in these bacterial communities, but not the metabolic outcome itself, which is driven by the environment, and which can be attained through multiple alternative configurations. PMID:21980410
Privacy-preserving genomic testing in the clinic: a model using HIV treatment

PubMed Central

McLaren, Paul J.; Raisaro, Jean Louis; Aouri, Manel; Rotger, Margalida; Ayday, Erman; Bartha, István; Delgado, Maria B.; Vallet, Yannick; Günthard, Huldrych F.; Cavassini, Matthias; Furrer, Hansjakob; Doco-Lecompte, Thanh; Marzolini, Catia; Schmid, Patrick; Di Benedetto, Caroline; Decosterd, Laurent A.; Fellay, Jacques; Hubaux, Jean-Pierre; Telenti, Amalio

2016-01-01

Purpose: The implementation of genomic-based medicine is hindered by unresolved questions regarding data privacy and delivery of interpreted results to health-care practitioners. We used DNA-based prediction of HIV-related outcomes as a model to explore critical issues in clinical genomics. Genet Med 18 8, 814–822. Methods: We genotyped 4,149 markers in HIV-positive individuals. Variants allowed for prediction of 17 traits relevant to HIV medical care, inference of patient ancestry, and imputation of human leukocyte antigen (HLA) types. Genetic data were processed under a privacy-preserving framework using homomorphic encryption, and clinical reports describing potentially actionable results were delivered to health-care providers. Genet Med 18 8, 814–822. Results: A total of 230 patients were included in the study. We demonstrated the feasibility of encrypting a large number of genetic markers, inferring patient ancestry, computing monogenic and polygenic trait risks, and reporting results under privacy-preserving conditions. The average execution time of a multimarker test on encrypted data was 865 ms on a standard computer. The proportion of tests returning potentially actionable genetic results ranged from 0 to 54%. Genet Med 18 8, 814–822. Conclusions: The model of implementation presented herein informs on strategies to deliver genomic test results for clinical care. Data encryption to ensure privacy helps to build patient trust, a key requirement on the road to genomic-based medicine. Genet Med 18 8, 814–822. PMID:26765343
Tissue-aware data integration approach for the inference of pathway interactions in metazoan organisms

PubMed Central

Park, Christopher Y.; Krishnan, Arjun; Zhu, Qian; Wong, Aaron K.; Lee, Young-Suk; Troyanskaya, Olga G.

2015-01-01

Motivation: Leveraging the large compendium of genomic data to predict biomedical pathways and specific mechanisms of protein interactions genome-wide in metazoan organisms has been challenging. In contrast to unicellular organisms, biological and technical variation originating from diverse tissues and cell-lineages is often the largest source of variation in metazoan data compendia. Therefore, a new computational strategy accounting for the tissue heterogeneity in the functional genomic data is needed to accurately translate the vast amount of human genomic data into specific interaction-level hypotheses. Results: We developed an integrated, scalable strategy for inferring multiple human gene interaction types that takes advantage of data from diverse tissue and cell-lineage origins. Our approach specifically predicts both the presence of a functional association and also the most likely interaction type among human genes or its protein products on a whole-genome scale. We demonstrate that directly incorporating tissue contextual information improves the accuracy of our predictions, and further, that such genome-wide results can be used to significantly refine regulatory interactions from primary experimental datasets (e.g. ChIP-Seq, mass spectrometry). Availability and implementation: An interactive website hosting all of our interaction predictions is publically available at http://pathwaynet.princeton.edu. Software was implemented using the open-source Sleipnir library, which is available for download at https://bitbucket.org/libsleipnir/libsleipnir.bitbucket.org. Contact: ogt@cs.princeton.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25431329
The African Genome Variation Project shapes medical genetics in Africa

NASA Astrophysics Data System (ADS)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2015-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The African Genome Variation Project shapes medical genetics in Africa.

PubMed

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

2015-01-15

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Integrating evolutionary and functional approaches to infer adaptation at specific loci.

PubMed

Storz, Jay F; Wheat, Christopher W

2010-09-01

Inferences about adaptation at specific loci are often exclusively based on the static analysis of DNA sequence variation. Ideally,population-genetic evidence for positive selection serves as a stepping-off point for experimental studies to elucidate the functional significance of the putatively adaptive variation. We argue that inferences about adaptation at specific loci are best achieved by integrating the indirect, retrospective insights provided by population-genetic analyses with the more direct, mechanistic insights provided by functional experiments. Integrative studies of adaptive genetic variation may sometimes be motivated by experimental insights into molecular function, which then provide the impetus to perform population genetic tests to evaluate whether the functional variation is of adaptive significance. In other cases, studies may be initiated by genome scans of DNA variation to identify candidate loci for recent adaptation. Results of such analyses can then motivate experimental efforts to test whether the identified candidate loci do in fact contribute to functional variation in some fitness-related phenotype. Functional studies can provide corroborative evidence for positive selection at particular loci, and can potentially reveal specific molecular mechanisms of adaptation.
Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

PubMed Central

Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

2012-01-01

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
The BioCyc collection of microbial genomes and metabolic pathways.

PubMed

Karp, Peter D; Billington, Richard; Caspi, Ron; Fulcher, Carol A; Latendresse, Mario; Kothari, Anamika; Keseler, Ingrid M; Krummenacker, Markus; Midford, Peter E; Ong, Quang; Ong, Wai Kit; Paley, Suzanne M; Subhraveti, Pallavi

2017-08-17

BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

PubMed

Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A

2016-07-01

Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

Treesearch

Brian J. Knaus; Richard Cronn; Aaron Liston; Kristine Pilgrim; Michael K. Schwartz

2011-01-01

Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the...
Functional phylogenomics analysis of bacteria and archaea using consistent genome annotation with UniFam

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chai, Juanjuan; Kora, Guruprasad; Ahn, Tae-Hyuk

2014-10-09

To supply some background, phylogenetic studies have provided detailed knowledge on the evolutionary mechanisms of genes and species in Bacteria and Archaea. However, the evolution of cellular functions, represented by metabolic pathways and biological processes, has not been systematically characterized. Many clades in the prokaryotic tree of life have now been covered by sequenced genomes in GenBank. This enables a large-scale functional phylogenomics study of many computationally inferred cellular functions across all sequenced prokaryotes. Our results show a total of 14,727 GenBank prokaryotic genomes were re-annotated using a new protein family database, UniFam, to obtain consistent functional annotations for accuratemore » comparison. The functional profile of a genome was represented by the biological process Gene Ontology (GO) terms in its annotation. The GO term enrichment analysis differentiated the functional profiles between selected archaeal taxa. 706 prokaryotic metabolic pathways were inferred from these genomes using Pathway Tools and MetaCyc. The consistency between the distribution of metabolic pathways in the genomes and the phylogenetic tree of the genomes was measured using parsimony scores and retention indices. The ancestral functional profiles at the internal nodes of the phylogenetic tree were reconstructed to track the gains and losses of metabolic pathways in evolutionary history. In conclusion, our functional phylogenomics analysis shows divergent functional profiles of taxa and clades. Such function-phylogeny correlation stems from a set of clade-specific cellular functions with low parsimony scores. On the other hand, many cellular functions are sparsely dispersed across many clades with high parsimony scores. These different types of cellular functions have distinct evolutionary patterns reconstructed from the prokaryotic tree.« less
Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

DOEpatents

Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

2002-10-15

A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.
Inferring the demographic history of European Ficedula flycatcher populations

PubMed Central

2013-01-01

Background Inference of population and species histories and population stratification using genetic data is important for discriminating between different speciation scenarios and for correct interpretation of genome scans for signs of adaptive evolution and trait association. Here we use data from 24 intronic loci re-sequenced in population samples of two closely related species, the pied flycatcher and the collared flycatcher. Results We applied Isolation-Migration models, assignment analyses and estimated the genetic differentiation and diversity between species and between populations within species. The data indicate a divergence time between the species of <1 million years, significantly shorter than previous estimates using mtDNA, point to a scenario with unidirectional gene-flow from the pied flycatcher into the collared flycatcher and imply that barriers to hybridisation are still permeable in a recently established hybrid zone. Furthermore, we detect significant population stratification, predominantly between the Spanish population and other pied flycatcher populations. Conclusions Our results provide further evidence for a divergence process where different genomic regions may be at different stages of speciation. We also conclude that forthcoming analyses of genotype-phenotype relations in these ecological model species should be designed to take population stratification into account. PMID:23282063
Genome-wide population structure and evolutionary history of the Frizarta dairy sheep.

PubMed

Kominakis, A; Hager-Theodorides, A L; Saridaki, A; Antonakos, G; Tsiamis, G

2017-10-01

In the present study, we used genomic data, generated with a medium density single nucleotide polymorphisms (SNP) array, to acquire more information on the population structure and evolutionary history of the synthetic Frizarta dairy sheep. First, two typical measures of linkage disequilibrium (LD) were estimated at various physical distances that were then used to make inferences on the effective population size at key past time points. Population structure was also assessed by both multidimensional scaling analysis and k-means clustering on the distance matrix obtained from the animals' genomic relationships. The Wright's fixation F ST index was also employed to assess herds' genetic homogeneity and to indirectly estimate past migration rates. The Wright's fixation F IS index and genomic inbreeding coefficients based on the genomic relationship matrix as well as on runs of homozygosity were also estimated. The Frizarta breed displays relatively low LD levels with r 2 and |D'| equal to 0.18 and 0.50, respectively, at an average inter-marker distance of 31 kb. Linkage disequilibrium decayed rapidly by distance and persisted over just a few thousand base pairs. Rate of LD decay (β) varied widely among the 26 autosomes with larger values estimated for shorter chromosomes (e.g. β=0.057, for OAR6) and smaller values for longer ones (e.g. β=0.022, for OAR2). The inferred effective population size at the beginning of the breed's formation was as high as 549, was then reduced to 463 in 1981 (end of the breed's formation) and further declined to 187, one generation ago. Multidimensional scaling analysis and k-means clustering suggested a genetically homogenous population, F ST estimates indicated relatively low genetic differentiation between herds, whereas a heat map of the animals' genomic kinship relationships revealed a stratified population, at a herd level. Estimates of genomic inbreeding coefficients suggested that most recent parental relatedness may have been a
Exploring lateral genetic transfer among microbial genomes using TF-IDF.

PubMed

Cong, Yingnan; Chan, Yao-Ban; Ragan, Mark A

2016-07-25

Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.
Genomic legacy of the African cheetah, Acinonyx jubatus.

PubMed

Dobrynin, Pavel; Liu, Shiping; Tamazian, Gaik; Xiong, Zijun; Yurchenko, Andrey A; Krasheninnikova, Ksenia; Kliver, Sergey; Schmidt-Küntzel, Anne; Koepfli, Klaus-Peter; Johnson, Warren; Kuderna, Lukas F K; García-Pérez, Raquel; Manuel, Marc de; Godinez, Ricardo; Komissarov, Aleksey; Makunin, Alexey; Brukhin, Vladimir; Qiu, Weilin; Zhou, Long; Li, Fang; Yi, Jian; Driscoll, Carlos; Antunes, Agostinho; Oleksyk, Taras K; Eizirik, Eduardo; Perelman, Polina; Roelke, Melody; Wildt, David; Diekhans, Mark; Marques-Bonet, Tomas; Marker, Laurie; Bhak, Jong; Wang, Jun; Zhang, Guojie; O'Brien, Stephen J

2015-12-10

Patterns of genetic and genomic variance are informative in inferring population history for human, model species and endangered populations. Here the genome sequence of wild-born African cheetahs reveals extreme genomic depletion in SNV incidence, SNV density, SNVs of coding genes, MHC class I and II genes, and mitochondrial DNA SNVs. Cheetah genomes are on average 95 % homozygous compared to the genomes of the outbred domestic cat (24.08 % homozygous), Virunga Mountain Gorilla (78.12 %), inbred Abyssinian cat (62.63 %), Tasmanian devil, domestic dog and other mammalian species. Demographic estimators impute two ancestral population bottlenecks: one >100,000 years ago coincident with cheetah migrations out of the Americas and into Eurasia and Africa, and a second 11,084-12,589 years ago in Africa coincident with late Pleistocene large mammal extinctions. MHC class I gene loss and dramatic reduction in functional diversity of MHC genes would explain why cheetahs ablate skin graft rejection among unrelated individuals. Significant excess of non-synonymous mutations in AKAP4 (p<0.02), a gene mediating spermatozoon development, indicates cheetah fixation of five function-damaging amino acid variants distinct from AKAP4 homologues of other Felidae or mammals; AKAP4 dysfunction may cause the cheetah's extremely high (>80 %) pleiomorphic sperm. The study provides an unprecedented genomic perspective for the rare cheetah, with potential relevance to the species' natural history, physiological adaptations and unique reproductive disposition.
Integration of Steady-State and Temporal Gene Expression Data for the Inference of Gene Regulatory Networks

PubMed Central

Wang, Yi Kan; Hurley, Daniel G.; Schnell, Santiago; Print, Cristin G.; Crampin, Edmund J.

2013-01-01

We develop a new regression algorithm, cMIKANA, for inference of gene regulatory networks from combinations of steady-state and time-series gene expression data. Using simulated gene expression datasets to assess the accuracy of reconstructing gene regulatory networks, we show that steady-state and time-series data sets can successfully be combined to identify gene regulatory interactions using the new algorithm. Inferring gene networks from combined data sets was found to be advantageous when using noisy measurements collected with either lower sampling rates or a limited number of experimental replicates. We illustrate our method by applying it to a microarray gene expression dataset from human umbilical vein endothelial cells (HUVECs) which combines time series data from treatment with growth factor TNF and steady state data from siRNA knockdown treatments. Our results suggest that the combination of steady-state and time-series datasets may provide better prediction of RNA-to-RNA interactions, and may also reveal biological features that cannot be identified from dynamic or steady state information alone. Finally, we consider the experimental design of genomics experiments for gene regulatory network inference and show that network inference can be improved by incorporating steady-state measurements with time-series data. PMID:23967277

Exploring the ancestry differentiation and inference capacity of the 28-plex AISNPs.

PubMed

Hao, Wei-Qi; Liu, Jing; Jiang, Li; Han, Jun-Ping; Wang, Ling; Li, Jiu-Ling; Ma, Quan; Liu, Chao; Wang, Hui-Jun; Li, Cai-Xia

2018-06-07

Inferring an unknown DNA's ancestry using a set of ancestry-informative single nucleotide polymorphisms (SNPs) in forensic science is useful to provide investigative leads. This is especially true when there is no DNA database match or specified suspect. Thus, a set of SNPs with highly robust and balanced differential power is strongly demanded in forensic science. In addition, it is also necessary to build a genotyping database for estimating the ancestry of an individual or an unknown DNA. For the differentiation of Africans, Europeans, East Asians, Native Americans, and Oceanians, the Global Nano set that includes just 31 SNPs was developed by de la Puente et al. Its ability for differentiation and balance was evaluated using the genotype data of the 1000 Genomes Phase III project and the Stanford University HGDP-CEPH. Just 402 samples were genotyped and analyzed as a reference set based on statistical methods. To validate the differentiating capacity using more samples, we developed a single-tube 28-plex SNP assay in which the SNPs were chosen from the 31 allelic loci of the Global AIMs Nano set. Three tri-allelic SNPs used to differentiate mixed-source DNA contribute little to population differentiation and were excluded here. Then, 998 individuals from 21 populations were typed, and these genotypes were combined with the genotype data obtained from 1000 Genomes Phase III and the Stanford University HGDP-CEPH (3090 total samples,43 populations) to estimate the power of this multiplex assay and build a database for the further inference of an individual or an unknown DNA sample in forensic practice.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size.

PubMed

Organ, Chris L; Brusatte, Stephen L; Stein, Koen

2009-12-22

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77-2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97-2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05-5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size

PubMed Central

Organ, Chris L.; Brusatte, Stephen L.; Stein, Koen

2009-01-01

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77–2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97–2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05–5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group. PMID:19793755
Linking genotype to phenotype in a changing ocean: inferring the genomic architecture of a blue mussel stress response with genome-wide association.

PubMed

Kingston, S E; Martino, P; Melendy, M; Reed, F A; Carlon, D B

2018-03-01

A key component to understanding the evolutionary response to a changing climate is linking underlying genetic variation to phenotypic variation in stress response. Here, we use a genome-wide association approach (GWAS) to understand the genetic architecture of calcification rates under simulated climate stress. We take advantage of the genomic gradient across the blue mussel hybrid zone (Mytilus edulis and Mytilus trossulus) in the Gulf of Maine (GOM) to link genetic variation with variance in calcification rates in response to simulated climate change. Falling calcium carbonate saturation states are predicted to negatively impact many marine organisms that build calcium carbonate shells - like blue mussels. We sampled wild mussels and measured net calcification phenotypes after exposing mussels to a 'climate change' common garden, where we raised temperature by 3°C, decreased pH by 0.2 units and limited food supply by filtering out planktonic particles >5 μm, compared to ambient GOM conditions in the summer. This climate change exposure greatly increased phenotypic variation in net calcification rates compared to ambient conditions. We then used regression models to link the phenotypic variation with over 170 000 single nucleotide polymorphism loci (SNPs) generated by genotype by sequencing to identify genomic locations associated with calcification phenotype, and estimate heritability and architecture of the trait. We identified at least one of potentially 2-10 genomic regions responsible for 30% of the phenotypic variation in calcification rates that are potential targets of natural selection by climate change. Our simulations suggest a power of 13.7% with our study's average effective sample size of 118 individuals and rare alleles, but a power of >90% when effective sample size is 900. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Characterization of the complete mitochondrial genome of the cloacal tapeworm Cloacotaenia megalops (Cestoda: Hymenolepididae).

PubMed

Guo, Aijiang

2016-09-05

The cloacal tapeworm Cloacotaenia megalops (Hymenolepididae) is one of the most common cestode parasites of domestic and wild ducks worldwide. However, limited information is available regarding its epidemiology, biology, genetics and systematics. This study provides characterisation of the complete mitochondrial (mt) genome of C. megalops. The complete mt genome of C. megalops was obtained by long PCR, sequenced and annotated. The length of the entire mt genome of C. megalops is 13,887 bp; it contains 12 protein-coding, 2 ribosomal RNA and 22 transfer RNA genes, but lacks an atp8 gene. The mt gene arrangement of C. megalops is identical to that observed in Anoplocephala magna and A. perfoliata (Anoplocephalidae), Dipylidium caninum (Dipylidiidae) and Hymenolepis diminuta (Hymenolepididae), but differs from that reported in taeniids owing to the position shift between the tRNA (L1) and tRNA (S2) genes. The phylogenetic position of C. megalops was inferred using Maximum likelihood and Bayesian inference methods based on the concatenated amino acid data for 12 protein-coding genes. Phylogenetic trees showed that C. megalops is sister to Anoplocephala spp. (Anoplocephalidae) + Pseudanoplocephala crawfordi + Hymenolepis spp. (Hymenolepididae) indicating that the family Hymenolepididae is paraphyletic. The complete mt genome of C. megalops is sequenced. Phylogenetic analyses provided an insight into the phylogenetic relationships among the families Anoplocephalidae, Hymenolepididae, Dipylidiidae and Taeniidae. This novel genomic information also provides the opportunity to develop useful genetic markers for studying the molecular epidemiology, biology, genetics and systematics of C. megalops.
Stan: Statistical inference

NASA Astrophysics Data System (ADS)

Stan Development Team

2018-01-01

Stan facilitates statistical inference at the frontiers of applied statistics and provides both a modeling language for specifying complex statistical models and a library of statistical algorithms for computing inferences with those models. These components are exposed through interfaces in environments such as R, Python, and the command line.
MIRA: An R package for DNA methylation-based inference of regulatory activity.

PubMed

Lawson, John T; Tomazou, Eleni M; Bock, Christoph; Sheffield, Nathan C

2018-03-01

DNA methylation contains information about the regulatory state of the cell. MIRA aggregates genome-scale DNA methylation data into a DNA methylation profile for independent region sets with shared biological annotation. Using this profile, MIRA infers and scores the collective regulatory activity for each region set. MIRA facilitates regulatory analysis in situations where classical regulatory assays would be difficult and allows public sources of open chromatin and protein binding regions to be leveraged for novel insight into the regulatory state of DNA methylation datasets. R package available on Bioconductor: http://bioconductor.org/packages/release/bioc/html/MIRA.html. nsheffield@virginia.edu.
Reveal, A General Reverse Engineering Algorithm for Inference of Genetic Network Architectures

NASA Technical Reports Server (NTRS)

Liang, Shoudan; Fuhrman, Stefanie; Somogyi, Roland

1998-01-01

Given the immanent gene expression mapping covering whole genomes during development, health and disease, we seek computational methods to maximize functional inference from such large data sets. Is it possible, in principle, to completely infer a complex regulatory network architecture from input/output patterns of its variables? We investigated this possibility using binary models of genetic networks. Trajectories, or state transition tables of Boolean nets, resemble time series of gene expression. By systematically analyzing the mutual information between input states and output states, one is able to infer the sets of input elements controlling each element or gene in the network. This process is unequivocal and exact for complete state transition tables. We implemented this REVerse Engineering ALgorithm (REVEAL) in a C program, and found the problem to be tractable within the conditions tested so far. For n = 50 (elements) and k = 3 (inputs per element), the analysis of incomplete state transition tables (100 state transition pairs out of a possible 10(exp 15)) reliably produced the original rule and wiring sets. While this study is limited to synchronous Boolean networks, the algorithm is generalizable to include multi-state models, essentially allowing direct application to realistic biological data sets. The ability to adequately solve the inverse problem may enable in-depth analysis of complex dynamic systems in biology and other fields.
The Genome of the Obligate Intracellular Parasite Trachipleistophora hominis: New Insights into Microsporidian Genome Dynamics and Reductive Evolution

PubMed Central

Heinz, Eva; Williams, Tom A.; Nakjang, Sirintra; Noël, Christophe J.; Swan, Daniel C.; Goldberg, Alina V.; Harris, Simon R.; Weinmaier, Thomas; Markert, Stephanie; Becher, Dörte; Bernhardt, Jörg; Dagan, Tal; Hacker, Christian; Lucocq, John M.; Schweder, Thomas; Rattei, Thomas; Hall, Neil; Hirt, Robert P.; Embley, T. Martin

2012-01-01

The dynamics of reductive genome evolution for eukaryotes living inside other eukaryotic cells are poorly understood compared to well-studied model systems involving obligate intracellular bacteria. Here we present 8.5 Mb of sequence from the genome of the microsporidian Trachipleistophora hominis, isolated from an HIV/AIDS patient, which is an outgroup to the smaller compacted-genome species that primarily inform ideas of evolutionary mode for these enormously successful obligate intracellular parasites. Our data provide detailed information on the gene content, genome architecture and intergenic regions of a larger microsporidian genome, while comparative analyses allowed us to infer genomic features and metabolism of the common ancestor of the species investigated. Gene length reduction and massive loss of metabolic capacity in the common ancestor was accompanied by the evolution of novel microsporidian-specific protein families, whose conservation among microsporidians, against a background of reductive evolution, suggests they may have important functions in their parasitic lifestyle. The ancestor had already lost many metabolic pathways but retained glycolysis and the pentose phosphate pathway to provide cytosolic ATP and reduced coenzymes, and it had a minimal mitochondrion (mitosome) making Fe-S clusters but not ATP. It possessed bacterial-like nucleotide transport proteins as a key innovation for stealing host-generated ATP, the machinery for RNAi, key elements of the early secretory pathway, canonical eukaryotic as well as microsporidian-specific regulatory elements, a diversity of repetitive and transposable elements, and relatively low average gene density. Microsporidian genome evolution thus appears to have proceeded in at least two major steps: an ancestral remodelling of the proteome upon transition to intracellular parasitism that involved reduction but also selective expansion, followed by a secondary compaction of genome architecture in some, but
Changes in bacterioplankton community structure during early lake ontogeny resulting from the retreat of the Greenland Ice Sheet

PubMed Central

Peter, Hannes; Jeppesen, Erik; De Meester, Luc; Sommaruga, Ruben

2018-01-01

Retreating glaciers and ice sheets are among the clearest signs of global climate change. One consequence of glacier retreat is the formation of new meltwater-lakes in previously ice-covered terrain. These lakes provide unique opportunities to understand patterns in community organization during early lake ontogeny. Here, we analyzed the bacterial community structure and diversity in six lakes recently formed by the retreat of the Greenland Ice Sheet (GrIS). The lakes represented a turbidity gradient depending on their past and present connectivity to the GrIS meltwaters. Bulk (16S rRNA genes) and putatively active (16S rRNA) fractions of the bacterioplankton communities were structured by changes in environmental conditions associated to the turbidity gradient. Differences in community structure among lakes were attributed to both, rare and abundant community members. Further, positive co-occurrence relationships among phylogenetically closely related community members dominate in these lakes. Our results show that environmental conditions along the turbidity gradient structure bacterial community composition, which shifts during lake ontogeny. Rare taxa contribute to these shifts, suggesting that the rare biosphere has an important ecological role during early lakes ontogeny. Members of the rare biosphere may be adapted to the transient niches in these nutrient poor lakes. The directionality and phylogenetic structure of co-occurrence relationships indicate that competitive interactions among closely related taxa may be important in the most turbid lakes. PMID:29087379
Characterization of polyploid wheat genomic diversity using a high-density 90 000 single nucleotide polymorphism array

USDA-ARS?s Scientific Manuscript database

High-density single nucleotide polymorphism (SNP) genotyping chips are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships among individuals in populations and studying marker-trait associations in mapping experiments. We developed a genotyping array includ...
Assembling networks of microbial genomes using linear programming.

PubMed

Holloway, Catherine; Beiko, Robert G

2010-11-20

Microbial genomes exhibit complex sets of genetic affinities due to lateral genetic transfer. Assessing the relative contributions of parent-to-offspring inheritance and gene sharing is a vital step in understanding the evolutionary origins and modern-day function of an organism, but recovering and showing these relationships is a challenging problem. We have developed a new approach that uses linear programming to find between-genome relationships, by treating tables of genetic affinities (here, represented by transformed BLAST e-values) as an optimization problem. Validation trials on simulated data demonstrate the effectiveness of the approach in recovering and representing vertical and lateral relationships among genomes. Application of the technique to a set comprising Aquifex aeolicus and 75 other thermophiles showed an important role for large genomes as 'hubs' in the gene sharing network, and suggested that genes are preferentially shared between organisms with similar optimal growth temperatures. We were also able to discover distinct and common genetic contributors to each sequenced representative of genus Pseudomonas. The linear programming approach we have developed can serve as an effective inference tool in its own right, and can be an efficient first step in a more-intensive phylogenomic analysis.
Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

NASA Astrophysics Data System (ADS)

Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

2016-01-01

The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

PubMed

Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

2016-01-22

The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Dynamics of Genome Rearrangement in Bacterial Populations

PubMed Central

Darling, Aaron E.; Miklós, István; Ragan, Mark A.

2008-01-01

Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

PubMed Central

Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

2006-01-01

Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
Linking phytoplankton and bacterioplankton community dynamics to iron-binding ligand production in a microcosm experiment

NASA Astrophysics Data System (ADS)

Hogle, S. L.; Bundy, R.; Barbeau, K.

2016-02-01

Several significant lines of evidence implicate heterotrophic bacterioplankton as agents of iron cycling and sources of iron-binding ligands in seawater, but direct and mechanistic linkages have mostly remained elusive. Currently, it is unknown how microbial community composition varies during the course of biogenic particle remineralization and how shifts in community structure are related to sources and sinks of Fe-binding ligands. In order to simulate the rise, decline, and ultimate remineralization of a phytoplankton bloom, we followed the production of different classes of Fe-binding ligands as measured by electrochemical techniques, Fe concentrations, and macronutrient concentrations in a series of iron-amended whole seawater incubations over a period of six days during a California Current Ecosystem Long Term Ecological Research (CCE-LTER) process cruise. At the termination of the experiment phytoplankton communities were similar across iron treatments, but high iron conditions generated greater phytoplankton biomass and increased nutrient drawdown suggesting that phytoplankton communities were in different phases of bloom development. Strikingly, L1 ligands akin to siderophores in binding strength were only observed in high iron treatments implicating phytoplankton bloom phase as an important control. Using high-throughput 16S rRNA gene surveys, we observed that the abundance of transiently dominant copiotroph bacteria were strongly correlated with L1 concentrations. However, incubations with similar L1 concentrations and binding strengths produced distinct copiotroph community profiles dominated by a few strains. We suggest that phytoplankton bloom maturity influences algal-associated heterotrophic community succession, and that L1 production is either directly or indirectly tied to the appearance and eventual dominance of rarely abundant copiotroph bacterial strains.
Reconstructing Past Admixture Processes from Local Genomic Ancestry Using Wavelet Transformation

PubMed Central

Sanderson, Jean; Sudoyo, Herawati; Karafet, Tatiana M.; Hammer, Michael F.; Cox, Murray P.

2015-01-01

Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions. PMID:25852078
Extending information retrieval methods to personalized genomic-based studies of disease.

PubMed

Ye, Shuyun; Dawson, John A; Kendziorski, Christina

2014-01-01

Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.
Is awareness necessary for true inference?

PubMed

Leo, Peter D; Greene, Anthony J

2008-09-01

In transitive inference, participants learn a set of context-dependent discriminations that can be organized into a hierarchy that supports inference. Several studies show that inference occurs with or without task awareness. However, some studies assert that without awareness, performance is attributable to pseudoinference. By this account, inference-like performance is achieved by differential stimulus weighting according to the stimuli's proximity to the end items of the hierarchy. We implement an inference task that cannot be based on differential stimulus weighting. The design itself rules out pseudoinference strategies. Success on the task without evidence of deliberative strategies would therefore suggest that true inference can be achieved implicitly. We found that accurate performance on the inference task was not dependent on explicit awareness. The finding is consistent with a growing body of evidence that indicates that forms of learning and memory supporting inference and flexibility do not necessarily depend on task awareness.

DOSE RESPONSE FROM HIGH THROUGHPUT GENE EXPRESSION STUDIES AND THE INFLUENCE OF TIME AND CELL LINE ON INFERRED MODE OF ACTION BY ONTOLOGIC ENRICHMENT (SOT)

EPA Science Inventory

Gene expression with ontologic enrichment and connectivity mapping tools is widely used to infer modes of action (MOA) for therapeutic drugs. Despite progress in high-throughput (HT) genomic systems, strategies suitable to identify industrial chemical MOA are needed. The L1000 is...
Simple Math is Enough: Two Examples of Inferring Functional Associations from Genomic Data

NASA Technical Reports Server (NTRS)

Liang, Shoudan

2003-01-01

Non-random features in the genomic data are usually biologically meaningful. The key is to choose the feature well. Having a p-value based score prioritizes the findings. If two proteins share a unusually large number of common interaction partners, they tend to be involved in the same biological process. We used this finding to predict the functions of 81 un-annotated proteins in yeast.
Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection

PubMed Central

Dialdestoro, Kevin; Sibbesen, Jonas Andreas; Maretty, Lasse; Raghwani, Jayna; Gall, Astrid; Kellam, Paul; Pybus, Oliver G.; Hein, Jotun; Jenkins, Paul A.

2016-01-01

Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput “deep” sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different time points during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intrahost viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this article we develop a new method for inference using HIV deep sequencing data, using an approach based on importance sampling of ancestral recombination graphs under a multilocus coalescent model. The approach further extends recent progress in the approximation of so-called conditional sampling distributions, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different time points and missing data without extra computational difficulty. We apply our method to a data set of HIV-1, in which several hundred sequences were obtained from an infected individual at seven time points over 2 years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available. PMID:26857628
[Gene transfer agent--a novel and widespread occurrence mechanism of gene exchange in ocean-a review].

PubMed

Cai, Haiyuan

2012-01-01

Gene Transfer Agent (GTA) particles are released by bacteria and resemble small, tailed bacteriophages. GTA particles contain small, random pieces of host DNA rather than GTA structural genes or a phage genome. Gene transfer mediated by GTA is efficient and species specific based on knowledge of currently best studied GTAs produced by 4 anaerobes. Genome sequencing projects have revealed a remarkable distribution of GTA gene clusters in the genomes of marine bacterioplankton, implying GTA may be an important mechanism for horizontal gene transfer in ocean. On basis of characterization of the 4 best studied GTAs, this review described GTAs released by numerically dominant marine bacteria, discussed their properties that were important for horizontal gene transfer in ocean, and gave future perspectives to advance GTA research.
Genome-Based Taxonomic Classification of Bacteroidetes

PubMed Central

Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus

2016-01-01

The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339
Inferring evolution of gene duplicates using probabilistic models and nonparametric belief propagation.

PubMed

Zeng, Jia; Hannenhalli, Sridhar

2013-01-01

Gene duplication, followed by functional evolution of duplicate genes, is a primary engine of evolutionary innovation. In turn, gene expression evolution is a critical component of overall functional evolution of paralogs. Inferring evolutionary history of gene expression among paralogs is therefore a problem of considerable interest. It also represents significant challenges. The standard approaches of evolutionary reconstruction assume that at an internal node of the duplication tree, the two duplicates evolve independently. However, because of various selection pressures functional evolution of the two paralogs may be coupled. The coupling of paralog evolution corresponds to three major fates of gene duplicates: subfunctionalization (SF), conserved function (CF) or neofunctionalization (NF). Quantitative analysis of these fates is of great interest and clearly influences evolutionary inference of expression. These two interrelated problems of inferring gene expression and evolutionary fates of gene duplicates have not been studied together previously and motivate the present study. Here we propose a novel probabilistic framework and algorithm to simultaneously infer (i) ancestral gene expression and (ii) the likely fate (SF, NF, CF) at each duplication event during the evolution of gene family. Using tissue-specific gene expression data, we develop a nonparametric belief propagation (NBP) algorithm to predict the ancestral expression level as a proxy for function, and describe a novel probabilistic model that relates the predicted and known expression levels to the possible evolutionary fates. We validate our model using simulation and then apply it to a genome-wide set of gene duplicates in human. Our results suggest that SF tends to be more frequent at the earlier stage of gene family expansion, while NF occurs more frequently later on.
Genome sequence diversity and clues to the evolution of variola (smallpox) virus.

PubMed

Esposito, Joseph J; Sammons, Scott A; Frace, A Michael; Osborne, John D; Olsen-Rasmussen, Melissa; Zhang, Ming; Govil, Dhwani; Damon, Inger K; Kline, Richard; Laker, Miriam; Li, Yu; Smith, Geoffrey L; Meyer, Hermann; Leduc, James W; Wohlhueter, Robert M

2006-08-11

Comparative genomics of 45 epidemiologically varied variola virus isolates from the past 30 years of the smallpox era indicate low sequence diversity, suggesting that there is probably little difference in the isolates' functional gene content. Phylogenetic clustering inferred three clades coincident with their geographical origin and case-fatality rate; the latter implicated putative proteins that mediate viral virulence differences. Analysis of the viral linear DNA genome suggests that its evolution involved direct descent and DNA end-region recombination events. Knowing the sequences will help understand the viral proteome and improve diagnostic test precision, therapeutics, and systems for their assessment.
Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome.

PubMed

Pedersen, Jakob Skou; Valen, Eivind; Velazquez, Amhed M Vargas; Parker, Brian J; Rasmussen, Morten; Lindgreen, Stinus; Lilje, Berit; Tobin, Desmond J; Kelly, Theresa K; Vang, Søren; Andersson, Robin; Jones, Peter A; Hoover, Cindi A; Tikhonov, Alexei; Prokhortchouk, Egor; Rubin, Edward M; Sandelin, Albin; Gilbert, M Thomas P; Krogh, Anders; Willerslev, Eske; Orlando, Ludovic

2014-03-01

Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics.
Genome-wide nucleosome map and cytosine methylation levels of an ancient human genome

PubMed Central

Pedersen, Jakob Skou; Valen, Eivind; Velazquez, Amhed M. Vargas; Parker, Brian J.; Rasmussen, Morten; Lindgreen, Stinus; Lilje, Berit; Tobin, Desmond J.; Kelly, Theresa K.; Vang, Søren; Andersson, Robin; Jones, Peter A.; Hoover, Cindi A.; Tikhonov, Alexei; Prokhortchouk, Egor; Rubin, Edward M.; Sandelin, Albin; Gilbert, M. Thomas P.; Krogh, Anders; Willerslev, Eske; Orlando, Ludovic

2014-01-01

Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics. PMID:24299735
Diversity and genomic insights into the uncultured Chloroflexi from the human microbiota.

PubMed

Campbell, Alisha G; Schwientek, Patrick; Vishnivetskaya, Tatiana; Woyke, Tanja; Levy, Shawn; Beall, Clifford J; Griffen, Ann; Leys, Eugene; Podar, Mircea

2014-09-01

Many microbial phyla that are widely distributed in open environments have few or no representatives within animal-associated microbiota. Among them, the Chloroflexi comprises taxonomically and physiologically diverse lineages adapted to a wide range of aquatic and terrestrial habitats. A distinct group of uncultured chloroflexi related to free-living anaerobic Anaerolineae inhabits the mammalian gastrointestinal tract and includes low-abundance human oral bacteria that appear to proliferate in periodontitis. Using a single-cell genomics approach, we obtained the first draft genomic reconstruction for these organisms and compared their inferred metabolic potential with free-living chloroflexi. Genomic data suggest that oral chloroflexi are anaerobic heterotrophs, encoding abundant carbohydrate transport and metabolism functionalities, similar to those seen in environmental Anaerolineae isolates. The presence of genes for a unique phosphotransferase system and N-acetylglucosamine metabolism suggests an important ecological niche for oral chloroflexi in scavenging material from lysed bacterial cells and the human tissue. The inferred ability to produce sialic acid for cell membrane decoration may enable them to evade the host defence system and colonize the subgingival space. As with other low abundance but persistent members of the microbiota, discerning community and host factors that influence the proliferation of oral chloroflexi may help understand the emergence of oral pathogens and the microbiota dynamics in health and disease states. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.
Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.

PubMed

Makarova, Kira S; Sorokin, Alexander V; Novichkov, Pavel S; Wolf, Yuri I; Koonin, Eugene V

2007-11-27

An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that
Children's Category-Based Inferences Affect Classification

ERIC Educational Resources Information Center

Ross, Brian H.; Gelman, Susan A.; Rosengren, Karl S.

2005-01-01

Children learn many new categories and make inferences about these categories. Much work has examined how children make inferences on the basis of category knowledge. However, inferences may also affect what is learned about a category. Four experiments examine whether category-based inferences during category learning influence category knowledge…
Genome-Based Microbial Taxonomy Coming of Age.

PubMed

Hugenholtz, Philip; Skarshewski, Adam; Parks, Donovan H

2016-06-01

Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary documents. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven by exponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.
Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation

PubMed Central

2013-01-01

Background The Streptococcus Anginosus Group (SAG) represents three closely related species of the viridans group streptococci recognized as commensal bacteria of the oral, gastrointestinal and urogenital tracts. The SAG also cause severe invasive infections, and are pathogens during cystic fibrosis (CF) pulmonary exacerbation. Little genomic information or description of virulence mechanisms is currently available for SAG. We conducted intra and inter species whole-genome comparative analyses with 59 publically available Streptococcus genomes and seven in-house closed high quality finished SAG genomes; S. constellatus (3), S. intermedius (2), and S. anginosus (2). For each SAG species, we sequenced at least one numerically dominant strain from CF airways recovered during acute exacerbation and an invasive, non-lung isolate. We also evaluated microevolution that occurred within two isolates that were cultured from one individual one year apart. Results The SAG genomes were most closely related to S. gordonii and S. sanguinis, based on shared orthologs and harbor a similar number of proteins within each COG category as other Streptococcus species. Numerous characterized streptococcus virulence factor homologs were identified within the SAG genomes including; adherence, invasion, spreading factors, LPxTG cell wall proteins, and two component histidine kinases known to be involved in virulence gene regulation. Mobile elements, primarily integrative conjugative elements and bacteriophage, account for greater than 10% of the SAG genomes. S. anginosus was the most variable species sequenced in this study, yielding both the smallest and the largest SAG genomes containing multiple genomic rearrangements, insertions and deletions. In contrast, within the S. constellatus and S. intermedius species, there was extensive continuous synteny, with only slight differences in genome size between strains. Within S. constellatus we were able to determine important SNPs and changes in
Is there a hierarchy of social inferences? The likelihood and speed of inferring intentionality, mind, and personality.

PubMed

Malle, Bertram F; Holbrook, Jess

2012-04-01

People interpret behavior by making inferences about agents' intentionality, mind, and personality. Past research studied such inferences 1 at a time; in real life, people make these inferences simultaneously. The present studies therefore examined whether 4 major inferences (intentionality, desire, belief, and personality), elicited simultaneously in response to an observed behavior, might be ordered in a hierarchy of likelihood and speed. To achieve generalizability, the studies included a wide range of stimulus behaviors, presented them verbally and as dynamic videos, and assessed inferences both in a retrieval paradigm (measuring the likelihood and speed of accessing inferences immediately after they were made) and in an online processing paradigm (measuring the speed of forming inferences during behavior observation). Five studies provide evidence for a hierarchy of social inferences-from intentionality and desire to belief to personality-that is stable across verbal and visual presentations and that parallels the order found in developmental and primate research. (c) 2012 APA, all rights reserved.
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection

PubMed Central

Stanley, Craig E.; Kulathinal, Rob J.

2016-01-01

With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.

PubMed

Stanley, Craig E; Kulathinal, Rob J

2016-08-09

With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of
Lateral gene transfers have polished animal genomes: lessons from nematodes

PubMed Central

Danchin, Etienne G. J.; Rosso, Marie-Noëlle

2012-01-01

It is now accepted that lateral gene transfers (LGT), have significantly contributed to the composition of bacterial genomes. The amplitude of the phenomenon is considered so high in prokaryotes that it challenges the traditional view of a binary hierarchical tree of life to correctly represent the evolutionary history of species. Given the plethora of transfers between prokaryotes, it is currently impossible to infer the last common ancestral gene set for any extant species. For this ensemble of reasons, it has been proposed that the Darwinian binary tree of life may be inappropriate to correctly reflect the actual relations between species, at least in prokaryotes. In contrast, the contribution of LGT to the composition of animal genomes is less documented. In the light of recent analyses that reported series of LGT events in nematodes, we discuss the importance of this phenomenon in the evolutionary history and in the current composition of an animal genome. Far from being neutral, it appears that besides having contributed to nematode genome contents, LGT have favored the emergence of important traits such as plant-parasitism. PMID:22919619
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.

PubMed

Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu

2018-05-09

The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex

PubMed Central

Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel

2016-01-01

The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094

Evolution of gastropod mitochondrial genome arrangements

PubMed Central

2008-01-01

Background Gastropod mitochondrial genomes exhibit an unusually great variety of gene orders compared to other metazoan mitochondrial genome such as e.g those of vertebrates. Hence, gastropod mitochondrial genomes constitute a good model system to study patterns, rates, and mechanisms of mitochondrial genome rearrangement. However, this kind of evolutionary comparative analysis requires a robust phylogenetic framework of the group under study, which has been elusive so far for gastropods in spite of the efforts carried out during the last two decades. Here, we report the complete nucleotide sequence of five mitochondrial genomes of gastropods (Pyramidella dolabrata, Ascobulla fragilis, Siphonaria pectinata, Onchidella celtica, and Myosotella myosotis), and we analyze them together with another ten complete mitochondrial genomes of gastropods currently available in molecular databases in order to reconstruct the phylogenetic relationships among the main lineages of gastropods. Results Comparative analyses with other mollusk mitochondrial genomes allowed us to describe molecular features and general trends in the evolution of mitochondrial genome organization in gastropods. Phylogenetic reconstruction with commonly used methods of phylogenetic inference (ME, MP, ML, BI) arrived at a single topology, which was used to reconstruct the evolution of mitochondrial gene rearrangements in the group. Conclusion Four main lineages were identified within gastropods: Caenogastropoda, Vetigastropoda, Patellogastropoda, and Heterobranchia. Caenogastropoda and Vetigastropoda are sister taxa, as well as, Patellogastropoda and Heterobranchia. This result rejects the validity of the derived clade Apogastropoda (Caenogastropoda + Heterobranchia). The position of Patellogastropoda remains unclear likely due to long-branch attraction biases. Within Heterobranchia, the most heterogeneous group of gastropods, neither Euthyneura (because of the inclusion of P. dolabrata) nor Pulmonata
Phylogenomic Reconstruction of the Oomycete Phylogeny Derived from 37 Genomes

PubMed Central

McCarthy, Charley G. P.

2017-01-01

ABSTRACT The oomycetes are a class of microscopic, filamentous eukaryotes within the Stramenopiles-Alveolata-Rhizaria (SAR) supergroup which includes ecologically significant animal and plant pathogens, most infamously the causative agent of potato blight Phytophthora infestans. Single-gene and concatenated phylogenetic studies both of individual oomycete genera and of members of the larger class have resulted in conflicting conclusions concerning species phylogenies within the oomycetes, particularly for the large Phytophthora genus. Genome-scale phylogenetic studies have successfully resolved many eukaryotic relationships by using supertree methods, which combine large numbers of potentially disparate trees to determine evolutionary relationships that cannot be inferred from individual phylogenies alone. With a sufficient amount of genomic data now available, we have undertaken the first whole-genome phylogenetic analysis of the oomycetes using data from 37 oomycete species and 6 SAR species. In our analysis, we used established supertree methods to generate phylogenies from 8,355 homologous oomycete and SAR gene families and have complemented those analyses with both phylogenomic network and concatenated supermatrix analyses. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and individual clades within the problematic Phytophthora genus. Support for the resolution of the inferred relationships between individual Phytophthora clades varies depending on the methodology used. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. IMPORTANCE The oomycetes are a class of eukaryotes and include ecologically significant animal and plant pathogens. Single-gene and multigene phylogenetic studies of individual oomycete genera and of members of the larger classes have resulted in conflicting conclusions concerning interspecies relationships among these species, particularly for the
Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities.

PubMed

Zomorrodi, Ali R; Segrè, Daniel

2017-11-16

Metabolite exchanges in microbial communities give rise to ecological interactions that govern ecosystem diversity and stability. It is unclear, however, how the rise of these interactions varies across metabolites and organisms. Here we address this question by integrating genome-scale models of metabolism with evolutionary game theory. Specifically, we use microbial fitness values estimated by metabolic models to infer evolutionarily stable interactions in multi-species microbial "games". We first validate our approach using a well-characterized yeast cheater-cooperator system. We next perform over 80,000 in silico experiments to infer how metabolic interdependencies mediated by amino acid leakage in Escherichia coli vary across 189 amino acid pairs. While most pairs display shared patterns of inter-species interactions, multiple deviations are caused by pleiotropy and epistasis in metabolism. Furthermore, simulated invasion experiments reveal possible paths to obligate cross-feeding. Our study provides genomically driven insight into the rise of ecological interactions, with implications for microbiome research and synthetic ecology.
Methodology for the inference of gene function from phenotype data.

PubMed

Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

2014-12-12

Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and
Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.

PubMed

Bolser, Dan M; Staines, Daniel M; Perry, Emily; Kersey, Paul J

2017-01-01

Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for 39 sequenced plant species. Available data includes genome sequence, gene models, functional annotation, and polymorphic loci; for the latter, additional information including population structure, individual genotypes, linkage, and phenotype data is available for some species. Comparative data is also available, including genomic alignments and "gene trees," which show the inferred evolutionary history of each gene family represented in the resource. Access to the data is provided through a genome browser, which incorporates many specialist interfaces for different data types, through a variety of programmatic interfaces, and via a specialist data mining tool supporting rapid filtering and retrieval of bulk data. Genomic data from many non-plant species, including those of plant pathogens, pests, and pollinators, is also available via the same interfaces through other divisions of Ensembl.Ensembl Plants is updated 4-6 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.eu ).
Network inference using informative priors.

PubMed

Mukherjee, Sach; Speed, Terence P

2008-09-23

Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.
Inference of developmental gene regulatory networks beyond classical model systems: new approaches in the post-genomic era.

PubMed

Fernandez-Valverde, Selene L; Aguilera, Felipe; Ramos-Díaz, René Alexander

2018-06-18

The advent of high-throughput sequencing technologies has revolutionized the way we understand the transformation of genetic information into morphological traits. Elucidating the network of interactions between genes that govern cell differentiation through development is one of the core challenges in genome research. These networks are known as developmental gene regulatory networks (dGRNs) and consist largely of the functional linkage between developmental control genes, cis-regulatory modules and differentiation genes, which generate spatially and temporally refined patterns of gene expression. Over the last 20 years, great advances have been made in determining these gene interactions mainly in classical model systems, including human, mouse, sea urchin, fruit fly, and worm. This has brought about a radical transformation in the fields of developmental biology and evolutionary biology, allowing the generation of high-resolution gene regulatory maps to analyse cell differentiation during animal development. Such maps have enabled the identification of gene regulatory circuits and have led to the development of network inference methods that can recapitulate the differentiation of specific cell-types or developmental stages. In contrast, dGRN research in non-classical model systems has been limited to the identification of developmental control genes via the candidate gene approach and the characterization of their spatiotemporal expression patterns, as well as to the discovery of cis-regulatory modules via patterns of sequence conservation and/or predicted transcription-factor binding sites. However, thanks to the continuous advances in high-throughput sequencing technologies, this scenario is rapidly changing. Here, we give a historical overview on the architecture and elucidation of the dGRNs. Subsequently, we summarize the approaches available to unravel these regulatory networks, highlighting the vast range of possibilities of integrating multiple technical
Phylogeny and biogeography of highly diverged freshwater fish species (Leuciscinae, Cyprinidae, Teleostei) inferred from mitochondrial genome analysis.

PubMed

Imoto, Junichi M; Saitoh, Kenji; Sasaki, Takeshi; Yonezawa, Takahiro; Adachi, Jun; Kartavtsev, Yuri P; Miya, Masaki; Nishida, Mutsumi; Hanzawa, Naoto

2013-02-10

The distribution of freshwater taxa is a good biogeographic model to study pattern and process of vicariance and dispersal. The subfamily Leuciscinae (Cyprinidae, Teleostei) consists of many species distributed widely in Eurasia and North America. Leuciscinae have been divided into two phyletic groups, leuciscin and phoxinin. The phylogenetic relationships between major clades within the subfamily are poorly understood, largely because of the overwhelming diversity of the group. The origin of the Far Eastern phoxinin is an interesting question regarding the evolutionary history of Leuciscinae. Here we present phylogenetic analysis of 31 species of Leuciscinae and outgroups based on complete mitochondrial genome sequences to clarify the phylogenetic relationships and to infer the evolutionary history of the subfamily. Phylogenetic analysis suggests that the Far Eastern phoxinin species comprised the monophyletic clades Tribolodon, Pseudaspius, Oreoleuciscus and Far Eastern Phoxinus. The Far Eastern phoxinin clade was independent of other Leuciscinae lineages and was closer to North American phoxinins than European leuciscins. All of our analysis also suggested that leuciscins and phoxinins each constituted monophyletic groups. Divergence time estimation suggested that Leuciscinae species diverged from outgroups such as Tincinae to be 83.3 million years ago (Mya) in the Late Cretaceous and leuciscin and phoxinin shared a common ancestor 70.7 Mya. Radiation of Leuciscinae lineages occurred during the Late Cretaceous to Paleocene. This period also witnessed the radiation of tetrapods. Reconstruction of ancestral areas indicates Leuciscinae species originated within Europe. Leuciscin species evolved in Europe and the ancestor of phoxinin was distributed in North America. The Far Eastern phoxinins would have dispersed from North America to Far East across the Beringia land bridge. The present study suggests important roles for the continental rearrangements during the
SAR202 Genomes from the Dark Ocean Predict Pathways for the Oxidation of Recalcitrant Dissolved Organic Matter.

PubMed

Landry, Zachary; Swan, Brandon K; Herndl, Gerhard J; Stepanauskas, Ramunas; Giovannoni, Stephen J

2017-04-18

Deep-ocean regions beyond the reach of sunlight contain an estimated 615 Pg of dissolved organic matter (DOM), much of which persists for thousands of years. It is thought that bacteria oxidize DOM until it is too dilute or refractory to support microbial activity. We analyzed five single-amplified genomes (SAGs) from the abundant SAR202 clade of dark-ocean bacterioplankton and found they encode multiple families of paralogous enzymes involved in carbon catabolism, including several families of oxidative enzymes that we hypothesize participate in the degradation of cyclic alkanes. The five partial genomes encoded 152 flavin mononucleotide/F420-dependent monooxygenases (FMNOs), many of which are predicted to be type II Baeyer-Villiger monooxygenases (BVMOs) that catalyze oxygen insertion into semilabile alicyclic alkanes. The large number of oxidative enzymes, as well as other families of enzymes that appear to play complementary roles in catabolic pathways, suggests that SAR202 might catalyze final steps in the biological oxidation of relatively recalcitrant organic compounds to refractory compounds that persist. IMPORTANCE Carbon in the ocean is massively sequestered in a complex mixture of biologically refractory molecules that accumulate as the chemical end member of biological oxidation and diagenetic change. However, few details are known about the biochemical machinery of carbon sequestration in the deep ocean. Reconstruction of the metabolism of a deep-ocean microbial clade, SAR202, led to postulation of new biochemical pathways that may be the penultimate stages of DOM oxidation to refractory forms that persist. These pathways are tied to a proliferation of oxidative enzymes. This research illuminates dark-ocean biochemistry that is broadly consequential for reconstructing the global carbon cycle. Copyright © 2017 Landry et al.
A Dynamic Tandem Repeat in Monocotyledons Inferred from a Comparative Analysis of Chloroplast Genomes in Melanthiaceae.

PubMed

Do, Hoang Dang Khoa; Kim, Joo-Hwan

2017-01-01

Chloroplast genomes (cpDNA) are highly valuable resources for evolutionary studies of angiosperms, since they are highly conserved, are small in size, and play critical roles in plants. Slipped-strand mispairing (SSM) was assumed to be a mechanism for generating repeat units in cpDNA. However, research on the employment of different small repeated sequences through SSM events, which may induce the accumulation of distinct types of repeats within the same region in cpDNA, has not been documented. Here, we sequenced two chloroplast genomes from the endemic species Heloniopsis tubiflora (Korea) and Xerophyllum tenax (USA) to cover the gap between molecular data and explore "hot spots" for genomic events in Melanthiaceae. Comparative analysis of 23 complete cpDNA sequences revealed that there were different stages of deletion in the rps16 region across the Melanthiaceae. Based on the partial or complete loss of rps16 gene in cpDNA, we have firstly reported potential molecular markers for recognizing two sections ( Veratrum and Fuscoveratrum ) of Veratrum . Melathiaceae exhibits a significant change in the junction between large single copy and inverted repeat regions, ranging from trnH_GUG to a part of rps3 . Our results show an accumulation of tandem repeats in the rpl23-ycf2 regions of cpDNAs. Small conserved sequences exist and flank tandem repeats in further observation of this region across most of the examined taxa of Liliales. Therefore, we propose three scenarios in which different small repeated sequences were used during SSM events to generate newly distinct types of repeats. Occasionally, prior to the SSM process, point mutation event and double strand break repair occurred and induced the formation of initial repeat units which are indispensable in the SSM process. SSM may have likely occurred more frequently for short repeats than for long repeat sequences in tribe Parideae (Melanthiaceae, Liliales). Collectively, these findings add new evidence of dynamic
Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome.

PubMed

Zhao, Keyan; Wright, Mark; Kimball, Jennifer; Eizenga, Georgia; McClung, Anna; Kovach, Michael; Tyagi, Wricha; Ali, Md Liakat; Tung, Chih-Wei; Reynolds, Andy; Bustamante, Carlos D; McCouch, Susan R

2010-05-24

The domestication of Asian rice (Oryza sativa) was a complex process punctuated by episodes of introgressive hybridization among and between subpopulations. Deep genetic divergence between the two main varietal groups (Indica and Japonica) suggests domestication from at least two distinct wild populations. However, genetic uniformity surrounding key domestication genes across divergent subpopulations suggests cultural exchange of genetic material among ancient farmers. In this study, we utilize a novel 1,536 SNP panel genotyped across 395 diverse accessions of O. sativa to study genome-wide patterns of polymorphism, to characterize population structure, and to infer the introgression history of domesticated Asian rice. Our population structure analyses support the existence of five major subpopulations (indica, aus, tropical japonica, temperate japonica and GroupV) consistent with previous analyses. Our introgression analysis shows that most accessions exhibit some degree of admixture, with many individuals within a population sharing the same introgressed segment due to artificial selection. Admixture mapping and association analysis of amylose content and grain length illustrate the potential for dissecting the genetic basis of complex traits in domesticated plant populations. Genes in these regions control a myriad of traits including plant stature, blast resistance, and amylose content. These analyses highlight the power of population genomics in agricultural systems to identify functionally important regions of the genome and to decipher the role of human-directed breeding in refashioning the genomes of a domesticated species.
Organellar Genomes from a ∼5,000-Year-Old Archaeological Maize Sample Are Closely Related to NB Genotype

PubMed Central

Pérez-Zamorano, Bernardo; Vallebueno-Estrada, Miguel; Martínez González, Javier; García Cook, Angel; Montiel, Rafael; Vielle-Calzada, Jean-Philippe

2017-01-01

The story of how preColumbian civilizations developed goes hand-in-hand with the process of plant domestication by Mesoamerican inhabitants. Here, we present the almost complete sequence of a mitochondrial genome and a partial chloroplast genome from an archaeological maize sample collected at the Valley of Tehuacán, México. Accelerator mass spectrometry dated the maize sample to be 5,040–5,300 years before present (95% probability). Phylogenetic analysis of the mitochondrial genome shows that the archaeological sample branches basal to the other Zea mays genomes, as expected. However, this analysis also indicates that fertile genotype NB is closely related to the archaeological maize sample and evolved before cytoplasmic male sterility genotypes (CMS-S, CMS-T, and CMS-C), thus contradicting previous phylogenetic analysis of mitochondrial genomes from maize. We show that maximum-likelihood infers a tree where CMS genotypes branch at the base of the tree when including sites that have a relative fast rate of evolution thus suggesting long-branch attraction. We also show that Bayesian analysis infer a topology where NB and the archaeological maize sample are at the base of the tree even when including faster sites. We therefore suggest that previous trees suffered from long-branch attraction. We also show that the phylogenetic analysis of the ancient chloroplast is congruent with genotype NB to be more closely related to the archaeological maize sample. As shown here, the inclusion of ancient genomes on phylogenetic trees greatly improves our understanding of the domestication process of maize, one of the most important crops worldwide. PMID:28338960
Bayes factors and multimodel inference

USGS Publications Warehouse

Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.

2009-01-01

Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
The first mitochondrial genome for the butterfly family Riodinidae (Abisara fylloides) and its systematic implications.

PubMed

Zhao, Fang; Huang, Dun-Yuan; Sun, Xiao-Yan; Shi, Qing-Hui; Hao, Jia-Sheng; Zhang, Lan-Lan; Yang, Qun

2013-10-01

The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides, the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon. The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides , the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

PubMed

2004-12-09

We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
CCLasso: correlation inference for compositional data through Lasso.

PubMed

Fang, Huaying; Huang, Chengcheng; Zhao, Hongyu; Deng, Minghua

2015-10-01

Direct analysis of microbial communities in the environment and human body has become more convenient and reliable owing to the advancements of high-throughput sequencing techniques for 16S rRNA gene profiling. Inferring the correlation relationship among members of microbial communities is of fundamental importance for genomic survey study. Traditional Pearson correlation analysis treating the observed data as absolute abundances of the microbes may lead to spurious results because the data only represent relative abundances. Special care and appropriate methods are required prior to correlation analysis for these compositional data. In this article, we first discuss the correlation definition of latent variables for compositional data. We then propose a novel method called CCLasso based on least squares with [Formula: see text] penalty to infer the correlation network for latent variables of compositional data from metagenomic data. An effective alternating direction algorithm from augmented Lagrangian method is used to solve the optimization problem. The simulation results show that CCLasso outperforms existing methods, e.g. SparCC, in edge recovery for compositional data. It also compares well with SparCC in estimating correlation network of microbe species from the Human Microbiome Project. CCLasso is open source and freely available from https://github.com/huayingfang/CCLasso under GNU LGPL v3. dengmh@pku.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Inferring Admixture Histories of Human Populations Using Linkage Disequilibrium

PubMed Central

Loh, Po-Ru; Lipson, Mark; Patterson, Nick; Moorjani, Priya; Pickrell, Joseph K.; Reich, David; Berger, Bonnie

2013-01-01

Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese. PMID:23410830
Genome-Based Taxonomic Classification of Bacteroidetes

DOE PAGES

Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...

2016-12-20

The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less
Genome-Based Taxonomic Classification of Bacteroidetes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina

The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less
A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

PubMed

Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

2016-09-02

Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal

Complete genome sequences of cowpea polerovirus 1 and cowpea polerovirus 2 infecting cowpea plants in Burkina Faso.

PubMed

Palanga, Essowè; Martin, Darren P; Galzi, Serge; Zabré, Jean; Bouda, Zakaria; Neya, James Bouma; Sawadogo, Mahamadou; Traore, Oumar; Peterschmitt, Michel; Roumagnac, Philippe; Filloux, Denis

2017-07-01

The full-length genome sequences of two novel poleroviruses found infecting cowpea plants, cowpea polerovirus 1 (CPPV1) and cowpea polerovirus 2 (CPPV2), were determined using overlapping RT-PCR and RACE-PCR. Whereas the 5845-nt CPPV1 genome was most similar to chickpea chlorotic stunt virus (73% identity), the 5945-nt CPPV2 genome was most similar to phasey bean mild yellow virus (86% identity). The CPPV1 and CPPV2 genomes both have a typical polerovirus genome organization. Phylogenetic analysis of the inferred P1-P2 and P3 amino acid sequences confirmed that CPPV1 and CPPV2 are indeed poleroviruses. Four apparently unique recombination events were detected within a dataset of 12 full polerovirus genome sequences, including two events in the CPPV2 genome. Based on the current species demarcation criteria for the family Luteoviridae, we tentatively propose that CPPV1 and CPPV2 should be considered members of novel polerovirus species.
Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery1

PubMed Central

Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.

2016-01-01

Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes. PMID:27437173
Whole-genome sequencing reveals the extent of heterozygosity in a preferentially self-fertilizing hermaphroditic vertebrate.

PubMed

Lins, Luana S F; Trojahn, Shawn; Sockell, Alexandra; Yee, Muh-Ching; Tatarenkov, Andrey; Bustamante, Carlos D; Earley, Ryan L; Kelley, Joanna L

2018-04-01

The mangrove rivulus, Kryptolebias marmoratus, is one of only two self-fertilizing hermaphroditic fish species and inhabits mangrove forests. While selfing can be advantageous, it reduces heterozygosity and decreases genetic diversity. Studies using microsatellites found that there are variable levels of selfing among populations of K. marmoratus, but overall, there is a low rate of outcrossing and, therefore, low heterozygosity. In this study, we used whole-genome data to assess the levels of heterozygosity in different lineages of the mangrove rivulus and infer the phylogenetic relationships among those lineages. We sequenced whole genomes from 15 lineages that were completely homozygous at microsatellite loci and used single nucleotide polymorphisms (SNPs) to determine heterozygosity levels. More variation was uncovered than in studies using microsatellite data because of the resolution of full genome sequencing data. Moreover, missense polymorphisms were found most often in genes associated with immune function and reproduction. Inferred phylogenetic relationships suggest that lineages largely group by their geographic distribution. The use of whole-genome data provided further insight into genetic diversity in this unique species. Although this study was limited by the number of lineages that were available, these data suggest that there is previously undescribed variation within lineages of K. marmoratus that could have functional consequences and (or) inform us about the limits to selfing (e.g., genetic load, accumulation of deleterious mutations) and selection that might favor the maintenance of heterozygosity. These results highlight the need to sequence additional individuals within and among lineages.
The Core and Accessory Genomes of Burkholderia pseudomallei: Implications for Human Melioidosis

PubMed Central

Lin, Chi Ho; Karuturi, R. Krishna M.; Wuthiekanun, Vanaporn; Tuanyok, Apichai; Chua, Hui Hoon; Ong, Catherine; Paramalingam, Sivalingam Suppiah; Tan, Gladys; Tang, Lynn; Lau, Gary; Ooi, Eng Eong; Woods, Donald; Feil, Edward; Peacock, Sharon J.; Tan, Patrick

2008-01-01

Natural isolates of Burkholderia pseudomallei (Bp), the causative agent of melioidosis, can exhibit significant ecological flexibility that is likely reflective of a dynamic genome. Using whole-genome Bp microarrays, we examined patterns of gene presence and absence across 94 South East Asian strains isolated from a variety of clinical, environmental, or animal sources. 86% of the Bp K96243 reference genome was common to all the strains representing the Bp “core genome”, comprising genes largely involved in essential functions (eg amino acid metabolism, protein translation). In contrast, 14% of the K96243 genome was variably present across the isolates. This Bp accessory genome encompassed multiple genomic islands (GIs), paralogous genes, and insertions/deletions, including three distinct lipopolysaccharide (LPS)-related gene clusters. Strikingly, strains recovered from cases of human melioidosis clustered on a tree based on accessory gene content, and were significantly more likely to harbor certain GIs compared to animal and environmental isolates. Consistent with the inference that the GIs may contribute to pathogenesis, experimental mutation of BPSS2053, a GI gene, reduced microbial adherence to human epithelial cells. Our results suggest that the Bp accessory genome is likely to play an important role in microbial adaptation and virulence. PMID:18927621
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

PubMed Central

Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

2017-01-01

The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways. PMID:29049295
Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks.

PubMed

Deeter, Anthony; Dalman, Mark; Haddad, Joseph; Duan, Zhong-Hui

2017-01-01

The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

PubMed

Bhaskar, Anand; Song, Yun S

2014-01-01

The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
Genomic analyses inform on migration events during the peopling of Eurasia

NASA Astrophysics Data System (ADS)

Pagani, Luca; Lawson, Daniel John; Jagoda, Evelyn; Mörseburg, Alexander; Eriksson, Anders; Mitt, Mario; Clemente, Florian; Hudjashov, Georgi; Degiorgio, Michael; Saag, Lauri; Wall, Jeffrey D.; Cardona, Alexia; Mägi, Reedik; Sayres, Melissa A. Wilson; Kaewert, Sarah; Inchley, Charlotte; Scheib, Christiana L.; Järve, Mari; Karmin, Monika; Jacobs, Guy S.; Antao, Tiago; Iliescu, Florin Mircea; Kushniarevich, Alena; Ayub, Qasim; Tyler-Smith, Chris; Xue, Yali; Yunusbayev, Bayazit; Tambets, Kristiina; Mallick, Chandana Basu; Saag, Lehti; Pocheshkhova, Elvira; Andriadze, George; Muller, Craig; Westaway, Michael C.; Lambert, David M.; Zoraqi, Grigor; Turdikulova, Shahlo; Dalimova, Dilbar; Sabitov, Zhaxylyk; Sultana, Gazi Nurun Nahar; Lachance, Joseph; Tishkoff, Sarah; Momynaliev, Kuvat; Isakova, Jainagul; Damba, Larisa D.; Gubina, Marina; Nymadawa, Pagbajabyn; Evseeva, Irina; Atramentova, Lubov; Utevska, Olga; Ricaut, François-Xavier; Brucato, Nicolas; Sudoyo, Herawati; Letellier, Thierry; Cox, Murray P.; Barashkov, Nikolay A.; Škaro, Vedrana; Mulaha´, Lejla; Primorac, Dragan; Sahakyan, Hovhannes; Mormina, Maru; Eichstaedt, Christina A.; Lichman, Daria V.; Abdullah, Syafiq; Chaubey, Gyaneshwer; Wee, Joseph T. S.; Mihailov, Evelin; Karunas, Alexandra; Litvinov, Sergei; Khusainova, Rita; Ekomasova, Natalya; Akhmetova, Vita; Khidiyatova, Irina; Marjanović, Damir; Yepiskoposyan, Levon; Behar, Doron M.; Balanovska, Elena; Metspalu, Andres; Derenko, Miroslava; Malyarchuk, Boris; Voevoda, Mikhail; Fedorova, Sardana A.; Osipova, Ludmila P.; Lahr, Marta Mirazón; Gerbault, Pascale; Leavesley, Matthew; Migliano, Andrea Bamberg; Petraglia, Michael; Balanovsky, Oleg; Khusnutdinova, Elza K.; Metspalu, Ene; Thomas, Mark G.; Manica, Andrea; Nielsen, Rasmus; Villems, Richard; Willerslev, Eske; Kivisild, Toomas; Metspalu, Mait

2016-10-01

High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations, or been targeted at specific diseases, such as cancer. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.
Genomic analyses inform on migration events during the peopling of Eurasia.

PubMed

Pagani, Luca; Lawson, Daniel John; Jagoda, Evelyn; Mörseburg, Alexander; Eriksson, Anders; Mitt, Mario; Clemente, Florian; Hudjashov, Georgi; DeGiorgio, Michael; Saag, Lauri; Wall, Jeffrey D; Cardona, Alexia; Mägi, Reedik; Wilson Sayres, Melissa A; Kaewert, Sarah; Inchley, Charlotte; Scheib, Christiana L; Järve, Mari; Karmin, Monika; Jacobs, Guy S; Antao, Tiago; Iliescu, Florin Mircea; Kushniarevich, Alena; Ayub, Qasim; Tyler-Smith, Chris; Xue, Yali; Yunusbayev, Bayazit; Tambets, Kristiina; Mallick, Chandana Basu; Saag, Lehti; Pocheshkhova, Elvira; Andriadze, George; Muller, Craig; Westaway, Michael C; Lambert, David M; Zoraqi, Grigor; Turdikulova, Shahlo; Dalimova, Dilbar; Sabitov, Zhaxylyk; Sultana, Gazi Nurun Nahar; Lachance, Joseph; Tishkoff, Sarah; Momynaliev, Kuvat; Isakova, Jainagul; Damba, Larisa D; Gubina, Marina; Nymadawa, Pagbajabyn; Evseeva, Irina; Atramentova, Lubov; Utevska, Olga; Ricaut, François-Xavier; Brucato, Nicolas; Sudoyo, Herawati; Letellier, Thierry; Cox, Murray P; Barashkov, Nikolay A; Skaro, Vedrana; Mulahasanovic, Lejla; Primorac, Dragan; Sahakyan, Hovhannes; Mormina, Maru; Eichstaedt, Christina A; Lichman, Daria V; Abdullah, Syafiq; Chaubey, Gyaneshwer; Wee, Joseph T S; Mihailov, Evelin; Karunas, Alexandra; Litvinov, Sergei; Khusainova, Rita; Ekomasova, Natalya; Akhmetova, Vita; Khidiyatova, Irina; Marjanović, Damir; Yepiskoposyan, Levon; Behar, Doron M; Balanovska, Elena; Metspalu, Andres; Derenko, Miroslava; Malyarchuk, Boris; Voevoda, Mikhail; Fedorova, Sardana A; Osipova, Ludmila P; Lahr, Marta Mirazón; Gerbault, Pascale; Leavesley, Matthew; Migliano, Andrea Bamberg; Petraglia, Michael; Balanovsky, Oleg; Khusnutdinova, Elza K; Metspalu, Ene; Thomas, Mark G; Manica, Andrea; Nielsen, Rasmus; Villems, Richard; Willerslev, Eske; Kivisild, Toomas; Metspalu, Mait

2016-10-13

High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations, or been targeted at specific diseases, such as cancer. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.
Consensus pan-genome assembly of the specialised wine bacterium Oenococcus oeni.

PubMed

Sternes, Peter R; Borneman, Anthony R

2016-04-27

Oenococcus oeni is a lactic acid bacterium that is specialised for growth in the ecological niche of wine, where it is noted for its ability to perform the secondary, malolactic fermentation that is often required for many types of wine. Expanding the understanding of strain-dependent genetic variations in its small and streamlined genome is important for realising its full potential in industrial fermentation processes. Whole genome comparison was performed on 191 strains of O. oeni; from this rich source of genomic information consensus pan-genome assemblies of the invariant (core) and variable (flexible) regions of this organism were established. Genetic variation in amino acid biosynthesis and sugar transport and utilisation was found to be common between strains. Furthermore, we characterised previously-unreported intra-specific genetic variations in the natural competence of this microbe. By assembling a consensus pan-genome from a large number of strains, this study provides a tool for researchers to readily compare protein-coding genes across strains and infer functional relationships between genes in conserved syntenic regions. This establishes a foundation for further genetic, and thus phenotypic, research of this industrially-important species.
Response of marine bacterioplankton to a massive under-ice phytoplankton bloom in the Chukchi Sea (Western Arctic Ocean)

NASA Astrophysics Data System (ADS)

Ortega-Retuerta, E.; Fichot, C. G.; Arrigo, K. R.; Van Dijken, G. L.; Joux, F.

2014-07-01

The activity of heterotrophic bacterioplankton and their response to changes in primary production in the Arctic Ocean is essential to understand biogenic carbon flows in the area. In this study, we explored the patterns of bacterial abundance (BA) and bacterial production (BP) in waters coinciding with a massive under-ice phytoplankton bloom in the Chukchi Sea in summer 2011, where chlorophyll a (chl a) concentrations were up to 38.9 mg m-3. Contrary to our expectations, BA and BP did not show their highest values coinciding with the bloom. In fact, bacterial biomass was only 3.5% of phytoplankton biomass. Similarly, average DOC values were similar inside (average 57.2±3.1 μM) and outside (average 64.3±4.8 μM) the bloom patch. Regression analyses showed relatively weak couplings, in terms of slope values, between chl a or primary production and BA or BP. Multiple regression analyses indicated that both temperature and chl a explained BA and BP variability in the Chukchi Sea. This temperature dependence was confirmed experimentally, as higher incubation temperatures (6.6 °C vs. 2.2 °C) enhanced BA and BP, with Q10 values of BP up to 20.0. Together, these results indicate that low temperatures in conjunction with low dissolved organic matter release can preclude bacteria to efficiently process a higher proportion of carbon fixed by phytoplankton, with further consequences on the carbon cycling in the area.
Changes in bacterioplankton community structure during early lake ontogeny resulting from the retreat of the Greenland Ice Sheet.

PubMed

Peter, Hannes; Jeppesen, Erik; De Meester, Luc; Sommaruga, Ruben

2017-10-31

Retreating glaciers and ice sheets are among the clearest signs of global climate change. One consequence of glacier retreat is the formation of new meltwater-lakes in previously ice-covered terrain. These lakes provide unique opportunities to understand patterns in community organization during early lake ontogeny. Here, we analyzed the bacterial community structure and diversity in six lakes recently formed by the retreat of the Greenland Ice Sheet (GrIS). The lakes represented a turbidity gradient depending on their past and present connectivity to the GrIS meltwaters. Bulk (16S rRNA genes) and putatively active (16S rRNA) fractions of the bacterioplankton communities were structured by changes in environmental conditions associated to the turbidity gradient. Differences in community structure among lakes were attributed to both, rare and abundant community members. Further, positive co-occurrence relationships among phylogenetically closely related community members dominate in these lakes. Our results show that environmental conditions along the turbidity gradient structure bacterial community composition, which shifts during lake ontogeny. Rare taxa contribute to these shifts, suggesting that the rare biosphere has an important ecological role during early lakes ontogeny. Members of the rare biosphere may be adapted to the transient niches in these nutrient poor lakes. The directionality and phylogenetic structure of co-occurrence relationships indicate that competitive interactions among closely related taxa may be important in the most turbid lakes.The ISME Journal advance online publication, 31 October 2017; doi:10.1038/ismej.2017.191.
Finding the missing honey bee genes: lessons learned from a genome upgrade.

PubMed

Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

2014-01-30

The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Finding the missing honey bee genes: lessons learned from a genome upgrade

PubMed Central

2014-01-01

Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. PMID:24479613
Variations on Bayesian Prediction and Inference

DTIC Science & Technology

2016-05-09

inference 2.2.1 Background There are a number of statistical inference problems that are not generally formulated via a full probability model...problem of inference about an unknown parameter, the Bayesian approach requires a full probability 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND...the problem of inference about an unknown parameter, the Bayesian approach requires a full probability model/likelihood which can be an obstacle
An Expanded Genomic Representation of the Phylum Cyanobacteria

PubMed Central

Soo, Rochelle M.; Skennerton, Connor T.; Sekiguchi, Yuji; Imelfort, Michael; Paech, Samuel J.; Dennis, Paul G.; Steen, Jason A.; Parks, Donovan H.; Tyson, Gene W.; Hugenholtz, Philip

2014-01-01

Molecular surveys of aphotic habitats have indicated the presence of major uncultured lineages phylogenetically classified as members of the Cyanobacteria. One of these lineages has recently been proposed as a nonphotosynthetic sister phylum to the Cyanobacteria, the Melainabacteria, based on recovery of population genomes from human gut and groundwater samples. Here, we expand the phylogenomic representation of the Melainabacteria through sequencing of six diverse population genomes from gut and bioreactor samples supporting the inference that this lineage is nonphotosynthetic, but not the assertion that they are strictly fermentative. We propose that the Melainabacteria is a class within the phylogenetically defined Cyanobacteria based on robust monophyly and shared ancestral traits with photosynthetic representatives. Our findings are consistent with theories that photosynthesis occurred late in the Cyanobacteria and involved extensive lateral gene transfer and extends the recognized functionality of members of this phylum. PMID:24709563
The genome of the Gulf pipefish enables understanding of evolutionary innovations.

PubMed

Small, C M; Bassham, S; Catchen, J; Amores, A; Fuiten, A M; Brown, R S; Jones, A G; Cresko, W A

2016-12-20

Evolutionary origins of derived morphologies ultimately stem from changes in protein structure, gene regulation, and gene content. A well-assembled, annotated reference genome is a central resource for pursuing these molecular phenomena underlying phenotypic evolution. We explored the genome of the Gulf pipefish (Syngnathus scovelli), which belongs to family Syngnathidae (pipefishes, seahorses, and seadragons). These fishes have dramatically derived bodies and a remarkable novelty among vertebrates, the male brood pouch. We produce a reference genome, condensed into chromosomes, for the Gulf pipefish. Gene losses and other changes have occurred in pipefish hox and dlx clusters and in the tbx and pitx gene families, candidate mechanisms for the evolution of syngnathid traits, including an elongated axis and the loss of ribs, pelvic fins, and teeth. We measure gene expression changes in pregnant versus non-pregnant brood pouch tissue and characterize the genomic organization of duplicated metalloprotease genes (patristacins) recruited into the function of this novel structure. Phylogenetic inference using ultraconserved sequences provides an alternative hypothesis for the relationship between orders Syngnathiformes and Scombriformes. Comparisons of chromosome structure among percomorphs show that chromosome number in a pipefish ancestor became reduced via chromosomal fusions. The collected findings from this first syngnathid reference genome open a window into the genomic underpinnings of highly derived morphologies, demonstrating that de novo production of high quality and useful reference genomes is within reach of even small research groups.
The mitochondrial genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae.

PubMed

Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H

2017-05-12

Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
Global population genomics and comparisons of selective signatures from two invasions of melon fly, Zeugodacus cucurbitae (Diptera: Tephritidae)

USDA-ARS?s Scientific Manuscript database

Population genetics is a powerful tool for invasion biology and pest management, from tracing invasion pathways to informing management decisions with inference of population demographics. Genomics greatly increases the resolution of population-scale analyses, yet outside of model species with exten...
Generic comparison of protein inference engines.

PubMed

Claassen, Manfred; Reiter, Lukas; Hengartner, Michael O; Buhmann, Joachim M; Aebersold, Ruedi

2012-04-01

Protein identifications, instead of peptide-spectrum matches, constitute the biologically relevant result of shotgun proteomics studies. How to appropriately infer and report protein identifications has triggered a still ongoing debate. This debate has so far suffered from the lack of appropriate performance measures that allow us to objectively assess protein inference approaches. This study describes an intuitive, generic and yet formal performance measure and demonstrates how it enables experimentalists to select an optimal protein inference strategy for a given collection of fragment ion spectra. We applied the performance measure to systematically explore the benefit of excluding possibly unreliable protein identifications, such as single-hit wonders. Therefore, we defined a family of protein inference engines by extending a simple inference engine by thousands of pruning variants, each excluding a different specified set of possibly unreliable identifications. We benchmarked these protein inference engines on several data sets representing different proteomes and mass spectrometry platforms. Optimally performing inference engines retained all high confidence spectral evidence, without posterior exclusion of any type of protein identifications. Despite the diversity of studied data sets consistently supporting this rule, other data sets might behave differently. In order to ensure maximal reliable proteome coverage for data sets arising in other studies we advocate abstaining from rigid protein inference rules, such as exclusion of single-hit wonders, and instead consider several protein inference approaches and assess these with respect to the presented performance measure in the specific application context.

Gene expression inference with deep learning

PubMed Central

Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

2016-01-01

Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability and implementation: D-GEX is available at https://github.com/uci-cbcl/D-GEX. Contact: xhx@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26873929
Gene expression inference with deep learning.

PubMed

Chen, Yifei; Li, Yi; Narayan, Rajiv; Subramanian, Aravind; Xie, Xiaohui

2016-06-15

Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes. We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Automated deconvolution of structured mixtures from heterogeneous tumor genomic data

PubMed Central

Roman, Theodore; Xie, Lu

2017-01-01

With increasing appreciation for the extent and importance of intratumor heterogeneity, much attention in cancer research has focused on profiling heterogeneity on a single patient level. Although true single-cell genomic technologies are rapidly improving, they remain too noisy and costly at present for population-level studies. Bulk sequencing remains the standard for population-scale tumor genomics, creating a need for computational tools to separate contributions of multiple tumor clones and assorted stromal and infiltrating cell populations to pooled genomic data. All such methods are limited to coarse approximations of only a few cell subpopulations, however. In prior work, we demonstrated the feasibility of improving cell type deconvolution by taking advantage of substructure in genomic mixtures via a strategy called simplicial complex unmixing. We improve on past work by introducing enhancements to automate learning of substructured genomic mixtures, with specific emphasis on genome-wide copy number variation (CNV) data, as well as the ability to process quantitative RNA expression data, and heterogeneous combinations of RNA and CNV data. We introduce methods for dimensionality estimation to better decompose mixture model substructure; fuzzy clustering to better identify substructure in sparse, noisy data; and automated model inference methods for other key model parameters. We further demonstrate their effectiveness in identifying mixture substructure in true breast cancer CNV data from the Cancer Genome Atlas (TCGA). Source code is available at https://github.com/tedroman/WSCUnmix PMID:29059177
A universe of dwarfs and giants: genome size and chromosome evolution in the monocot family Melanthiaceae.

PubMed

Pellicer, Jaume; Kelly, Laura J; Leitch, Ilia J; Zomlefer, Wendy B; Fay, Michael F

2014-03-01

• Since the occurrence of giant genomes in angiosperms is restricted to just a few lineages, identifying where shifts towards genome obesity have occurred is essential for understanding the evolutionary mechanisms triggering this process. • Genome sizes were assessed using flow cytometry in 79 species and new chromosome numbers were obtained. Phylogenetically based statistical methods were applied to infer ancestral character reconstructions of chromosome numbers and nuclear DNA contents. • Melanthiaceae are the most diverse family in terms of genome size, with C-values ranging more than 230-fold. Our data confirmed that giant genomes are restricted to tribe Parideae, with most extant species in the family characterized by small genomes. Ancestral genome size reconstruction revealed that the most recent common ancestor (MRCA) for the family had a relatively small genome (1C = 5.37 pg). Chromosome losses and polyploidy are recovered as the main evolutionary mechanisms generating chromosome number change. • Genome evolution in Melanthiaceae has been characterized by a trend towards genome size reduction, with just one episode of dramatic DNA accumulation in Parideae. Such extreme contrasting profiles of genome size evolution illustrate the key role of transposable elements and chromosome rearrangements in driving the evolution of plant genomes. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Full-genome sequence and analysis of a novel human rhinovirus strain within a divergent HRV-A clade.

PubMed

Rathe, Jennifer A; Liu, Xinyue; Tallon, Luke J; Gern, James E; Liggett, Stephen B

2010-01-01

Genome sequences of human rhinoviruses (HRV) have primarily been from stocks collected in the 1960s, with genomes and phylogeny of modern HRVs remaining undefined. Here, two modern isolates (hrv-A101 and hrv-A101-v1) collected approximately 8 years apart were sequenced in their entirety. Incorporation into our full-genome HRV alignment with subsequent phylogenetic network inference indicated that these represent a unique HRV-A, localized within a distinct divergent clade. They appear to have resulted from recombination of the hrv-65 and hrv-78 lineages. These results support our contention that there are unrecognized distinct HRV-A strains, and that recombination is evident in currently circulating strains.
Children's and Adults' Evaluation of Their Own Inductive Inferences, Deductive Inferences, and Guesses

ERIC Educational Resources Information Center

Pillow, Bradford H.; Pearson, RaeAnne M.

2009-01-01

Adults' and kindergarten through fourth-grade children's evaluations and explanations of inductive inferences, deductive inferences, and guesses were assessed. Beginning in kindergarten, participants rated deductions as more certain than weak inductions or guesses. Beginning in third grade, deductions were rated as more certain than strong…
Whole genome sequencing data and de novo draft assemblies for 66 teleost species

PubMed Central

Malmstrøm, Martin; Matschiner, Michael; Tørresen, Ole K.; Jakobsen, Kjetill S.; Jentoft, Sissel

2017-01-01

Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9–39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts. PMID:28094797
Segmenting the human genome based on states of neutral genetic divergence.

PubMed

Kuruppumullage Don, Prabhani; Ananda, Guruprasad; Chiaromonte, Francesca; Makova, Kateryna D

2013-09-03

Many studies have demonstrated that divergence levels generated by different mutation types vary and covary across the human genome. To improve our still-incomplete understanding of the mechanistic basis of this phenomenon, we analyze several mutation types simultaneously, anchoring their variation to specific regions of the genome. Using hidden Markov models on insertion, deletion, nucleotide substitution, and microsatellite divergence estimates inferred from human-orangutan alignments of neutrally evolving genomic sequences, we segment the human genome into regions corresponding to different divergence states--each uniquely characterized by specific combinations of divergence levels. We then parsed the mutagenic contributions of various biochemical processes associating divergence states with a broad range of genomic landscape features. We find that high divergence states inhabit guanine- and cytosine (GC)-rich, highly recombining subtelomeric regions; low divergence states cover inner parts of autosomes; chromosome X forms its own state with lowest divergence; and a state of elevated microsatellite mutability is interspersed across the genome. These general trends are mirrored in human diversity data from the 1000 Genomes Project, and departures from them highlight the evolutionary history of primate chromosomes. We also find that genes and noncoding functional marks [annotations from the Encyclopedia of DNA Elements (ENCODE)] are concentrated in high divergence states. Our results provide a powerful tool for biomedical data analysis: segmentations can be used to screen personal genome variants--including those associated with cancer and other diseases--and to improve computational predictions of noncoding functional elements.
On the allopolyploid origin and genome structure of the closely related species Hordeum secalinum and Hordeum capense inferred by molecular karyotyping.

PubMed

Cuadrado, Ángeles; de Bustos, Alfredo; Jouve, Nicolás

2017-08-01

To provide additional information to the many phylogenetic analyses conducted within Hordeum , here the origin and interspecific affinities of the allotetraploids Hordeum secalinum and Hordeum capense were analysed by molecular karyotyping. Karyotypes were determined using genomic in situ hybridization (GISH) to distinguish the sub-genomes and , plus fluorescence in situ hybridization (FISH)/non-denaturing (ND)-FISH to determine the distribution of ten tandem repetitive DNA sequences and thus provide chromosome markers. Each chromosome pair in the six accessions analysed was identified, allowing the establishment of homologous and putative homeologous relationships. The low-level polymorphism observed among the H. secalinum accessions contrasted with the divergence recorded for the sub-genome of the H. capense accessions. Although accession H335 carries an intergenomic translocation, its chromosome structure was indistinguishable from that of H. secalinum . Hordeum secalinum and H. capense accession H335 share a hybrid origin involving Hordeum marinum subsp. gussoneanum as the genome donor and an unidentified genome progenitor. Hordeum capense accession BCC2062 either diverged, with remodelling of the sub-genome, or its genome was donated by a now extinct ancestor. A scheme of probable evolution shows the intricate pattern of relationships among the Hordeum species carrying the genome (including all H. marinum taxa and the hexaploid Hordeum brachyantherum ). © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Genome-Wide Typing of Clostridium difficile.

PubMed

Bletz, Stefan; Janezic, Sandra; Harmsen, Dag; Rupnik, Maja; Mellmann, Alexander

2018-06-01

Clostridium difficile , recently renamed Clostridioides difficile , is the most common cause of antibiotic-associated nosocomial gastrointestinal infections worldwide. To differentiate endogenous infections and transmission events, highly discriminatory subtyping is necessary. Today, methods based on whole-genome sequencing data are increasingly used to subtype bacterial pathogens; however, frequently a standardized methodology and typing nomenclature are missing. Here we report a core genome multilocus sequence typing (cgMLST) approach developed for C. difficile Initially, we determined the breadth of the C. difficile population based on all available MLST sequence types with Bayesian inference (BAPS). The resulting BAPS partitions were used in combination with C. difficile clade information to select representative isolates that were subsequently used to define cgMLST target genes. Finally, we evaluated the novel cgMLST scheme with genomes from 3,025 isolates. BAPS grouping ( n = 6 groups) together with the clade information led to a total of 11 representative isolates that were included for cgMLST definition and resulted in 2,270 cgMLST genes that were present in all isolates. Overall, 2,184 to 2,268 cgMLST targets were detected in the genome sequences of 70 outbreak-associated and reference strains, and on average 99.3% cgMLST targets (1,116 to 2,270 targets) were present in 2,954 genomes downloaded from the NCBI database, underlining the representativeness of the cgMLST scheme. Moreover, reanalyzing different cluster scenarios with cgMLST were concordant to published single nucleotide variant analyses. In conclusion, the novel cgMLST is representative for the whole C. difficile population, is highly discriminatory in outbreak situations, and provides a unique nomenclature facilitating interlaboratory exchange. Copyright © 2018 American Society for Microbiology.
Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis

PubMed Central

Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc

2018-01-01

ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946
Sensitivity to sequencing depth in single-cell cancer genomics.

PubMed

Alves, João M; Posada, David

2018-04-16

Querying cancer genomes at single-cell resolution is expected to provide a powerful framework to understand in detail the dynamics of cancer evolution. However, given the high costs currently associated with single-cell sequencing, together with the inevitable technical noise arising from single-cell genome amplification, cost-effective strategies that maximize the quality of single-cell data are critically needed. Taking advantage of previously published single-cell whole-genome and whole-exome cancer datasets, we studied the impact of sequencing depth and sampling effort towards single-cell variant detection. Five single-cell whole-genome and whole-exome cancer datasets were independently downscaled to 25, 10, 5, and 1× sequencing depth. For each depth level, ten technical replicates were generated, resulting in a total of 6280 single-cell BAM files. The sensitivity of variant detection, including structural and driver mutations, genotyping, clonal inference, and phylogenetic reconstruction to sequencing depth was evaluated using recent tools specifically designed for single-cell data. Altogether, our results suggest that for relatively large sample sizes (25 or more cells) sequencing single tumor cells at depths > 5× does not drastically improve somatic variant discovery, characterization of clonal genotypes, or estimation of single-cell phylogenies. We suggest that sequencing multiple individual tumor cells at a modest depth represents an effective alternative to explore the mutational landscape and clonal evolutionary patterns of cancer genomes.
A Meta-Analysis of Multiple Matched Copy Number and Transcriptomics Data Sets for Inferring Gene Regulatory Relationships

PubMed Central

Newton, Richard; Wernisch, Lorenz

2014-01-01

Inferring gene regulatory relationships from observational data is challenging. Manipulation and intervention is often required to unravel causal relationships unambiguously. However, gene copy number changes, as they frequently occur in cancer cells, might be considered natural manipulation experiments on gene expression. An increasing number of data sets on matched array comparative genomic hybridisation and transcriptomics experiments from a variety of cancer pathologies are becoming publicly available. Here we explore the potential of a meta-analysis of thirty such data sets. The aim of our analysis was to assess the potential of in silico inference of trans-acting gene regulatory relationships from this type of data. We found sufficient correlation signal in the data to infer gene regulatory relationships, with interesting similarities between data sets. A number of genes had highly correlated copy number and expression changes in many of the data sets and we present predicted potential trans-acted regulatory relationships for each of these genes. The study also investigates to what extent heterogeneity between cell types and between pathologies determines the number of statistically significant predictions available from a meta-analysis of experiments. PMID:25148247
Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

PubMed

Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

2017-10-03

Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.
Evolution of domain promiscuity in eukaryotic genomes—a perspective from the inferred ancestral domain architectures†

PubMed Central

Cohen-Gihon, Inbar; Fong, Jessica H.; Sharan, Roded; Nussinov, Ruth

2012-01-01

Most eukaryotic proteins are composed of two or more domains. These assemble in a modular manner to create new proteins usually by the acquisition of one or more domains to an existing protein. Promiscuous domains which are found embedded in a variety of proteins and co-exist with many other domains are of particular interest and were shown to have roles in signaling pathways and mediating network communication. The evolution of domain promiscuity is still an open problem, mostly due to the lack of sequenced ancestral genomes. Here we use inferred domain architectures of ancestral genomes to trace the evolution of domain promiscuity in eukaryotic genomes. We find an increase in average promiscuity along many branches of the eukaryotic tree. Moreover, domain promiscuity can proceed at almost a steady rate over long evolutionary time or exhibit lineage-specific acceleration. We also observe that many signaling and regulatory domains gained domain promiscuity around the Bilateria divergence. In addition we show that those domains that played a role in the creation of two body axes and existed before the divergence of the bilaterians from fungi/metazoan achieve a boost in their promiscuities during the bilaterian evolution. PMID:21127809
Analysis of adaptive evolution in Lyssavirus genomes reveals pervasive diversifying selection during species diversification.

PubMed

Voloch, Carolina M; Capellão, Renata T; Mello, Beatriz; Schrago, Carlos G

2014-11-19

Lyssavirus is a diverse genus of viruses that infect a variety of mammalian hosts, typically causing encephalitis. The evolution of this lineage, particularly the rabies virus, has been a focus of research because of the extensive occurrence of cross-species transmission, and the distinctive geographical patterns present throughout the diversification of these viruses. Although numerous studies have examined pattern-related questions concerning Lyssavirus evolution, analyses of the evolutionary processes acting on Lyssavirus diversification are scarce. To clarify the relevance of positive natural selection in Lyssavirus diversification, we conducted a comprehensive scan for episodic diversifying selection across all lineages and codon sites of the five coding regions in lyssavirus genomes. Although the genomes of these viruses are generally conserved, the glycoprotein (G), RNA-dependent RNA polymerase (L) and polymerase (P) genes were frequently targets of adaptive evolution during the diversification of the genus. Adaptive evolution is particularly manifest in the glycoprotein gene, which was inferred to have experienced the highest density of positively selected codon sites along branches. Substitutions in the L gene were found to be associated with the early diversification of phylogroups. A comparison between the number of positively selected sites inferred along the branches of RABV population branches and Lyssavirus intespecies branches suggested that the occurrence of positive selection was similar on the five coding regions of the genome in both groups.
Analysis of Adaptive Evolution in Lyssavirus Genomes Reveals Pervasive Diversifying Selection during Species Diversification

PubMed Central

Voloch, Carolina M.; Capellão, Renata T.; Mello, Beatriz; Schrago, Carlos G.

2014-01-01

Lyssavirus is a diverse genus of viruses that infect a variety of mammalian hosts, typically causing encephalitis. The evolution of this lineage, particularly the rabies virus, has been a focus of research because of the extensive occurrence of cross-species transmission, and the distinctive geographical patterns present throughout the diversification of these viruses. Although numerous studies have examined pattern-related questions concerning Lyssavirus evolution, analyses of the evolutionary processes acting on Lyssavirus diversification are scarce. To clarify the relevance of positive natural selection in Lyssavirus diversification, we conducted a comprehensive scan for episodic diversifying selection across all lineages and codon sites of the five coding regions in lyssavirus genomes. Although the genomes of these viruses are generally conserved, the glycoprotein (G), RNA-dependent RNA polymerase (L) and polymerase (P) genes were frequently targets of adaptive evolution during the diversification of the genus. Adaptive evolution is particularly manifest in the glycoprotein gene, which was inferred to have experienced the highest density of positively selected codon sites along branches. Substitutions in the L gene were found to be associated with the early diversification of phylogroups. A comparison between the number of positively selected sites inferred along the branches of RABV population branches and Lyssavirus intespecies branches suggested that the occurrence of positive selection was similar on the five coding regions of the genome in both groups. PMID:25415197
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution.

PubMed

Yap, Jia-Yee S; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y H; Wilton, Alan; Wilkins, Marc R; Rossetto, Maurizio; Delaney, Sven K

2015-01-01

The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution

PubMed Central

Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.

2015-01-01

The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691
King penguin demography since the last glaciation inferred from genome-wide data

PubMed Central

Trucchi, Emiliano; Gratton, Paolo; Whittington, Jason D.; Cristofari, Robin; Le Maho, Yvon; Stenseth, Nils Chr; Le Bohec, Céline

2014-01-01

How natural climate cycles, such as past glacial/interglacial patterns, have shaped species distributions at the high-latitude regions of the Southern Hemisphere is still largely unclear. Here, we show how the post-glacial warming following the Last Glacial Maximum (ca 18 000 years ago), allowed the (re)colonization of the fragmented sub-Antarctic habitat by an upper-level marine predator, the king penguin Aptenodytes patagonicus. Using restriction site-associated DNA sequencing and standard mitochondrial data, we tested the behaviour of subsets of anonymous nuclear loci in inferring past demography through coalescent-based and allele frequency spectrum analyses. Our results show that the king penguin population breeding on Crozet archipelago steeply increased in size, closely following the Holocene warming recorded in the Epica Dome C ice core. The following population growth can be explained by a threshold model in which the ecological requirements of this species (year-round ice-free habitat for breeding and access to a major source of food such as the Antarctic Polar Front) were met on Crozet soon after the Pleistocene/Holocene climatic transition. PMID:24920481

Some links on this page may take you to non-federal websites. Their policies may differ from this site.