structural genomics expectations: Topics by Science.gov

Sample records for structural genomics expectations

Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria.

PubMed

Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

2017-04-01

A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria. Copyright © 2017 Repar et al.
The three-dimensional genome organization of Drosophila melanogaster through data integration.

PubMed

Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank

2017-07-31

Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.
First complete genome sequence of vanilla mosaic strain of Dasheen mosaic virus isolated from the Cook Islands.

PubMed

Puli'uvea, Christopher; Khan, Subuhi; Chang, Wee-Leong; Valmonte, Gardette; Pearson, Michael N; Higgins, Colleen M

2017-02-01

We present the first complete genome of vanilla mosaic virus (VanMV). The VanMV genomic structure is consistent with that of a potyvirus, containing a single open reading frame (ORF) encoding a polyprotein of 3139 amino acids. Motif analyses indicate the polyprotein can be cleaved into the expected ten individual proteins; other recognised potyvirus motifs are also present. As expected, the VanMV genome shows high sequence similarity to the published Dasheen mosaic virus (DsMV) genome sequences; comparisons with DsMV continue to support VanMV as a vanilla infecting strain of DsMV. Phylogenetic analyses indicate that VanMV and DsMV share a common ancestor, with VanMV having the closest relationship with DsMV strains from the South Pacific.
Genome structure of bdelloid rotifers: shaped by asexuality or desiccation?

PubMed

Gladyshev, Eugene A; Arkhipova, Irina R

2010-01-01

Bdelloid rotifers are microscopic invertebrate animals best known for their ancient asexuality and the ability to survive desiccation at any life stage. Both factors are expected to have a profound influence on their genome structure. Recent molecular studies demonstrated that, although the gene-rich regions of bdelloid genomes are organized as colinear pairs of closely related sequences and depleted in repetitive DNA, subtelomeric regions harbor diverse transposable elements and horizontally acquired genes of foreign origin. Although asexuality is expected to result in depletion of deleterious transposons, only desiccation appears to have the power to produce all the uncovered genomic peculiarities. Repair of desiccation-induced DNA damage would require the presence of a homologous template, maintaining colinear pairs in gene-rich regions and selecting against insertion of repetitive DNA that might cause chromosomal rearrangements. Desiccation may also induce a transient state of competence in recovering animals, allowing them to acquire environmental DNA. Even if bdelloids engage in rare or obscure forms of sexual reproduction, all these features could still be present. The relative contribution of asexuality and desiccation to genome organization may be clarified by analyzing whole-genome sequences and comparing foreign gene and transposon content in species which lost the ability to survive desiccation.
Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

PubMed

Uchiyama, Ikuo

2008-10-31

Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative

PubMed Central

Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C.; Fiser, Andras

2014-01-01

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins—including proteins for which reliable homology models can be generated—on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long. PMID:24567391
Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative.

PubMed

Khafizov, Kamil; Madrid-Aliste, Carlos; Almo, Steven C; Fiser, Andras

2014-03-11

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.
GAP Final Technical Report 12-14-04

DOE Office of Scientific and Technical Information (OSTI.GOV)

Andrew J. Bordner, PhD, Senior Research Scientist

2004-12-14

The Genomics Annotation Platform (GAP) was designed to develop new tools for high throughput functional annotation and characterization of protein sequences and structures resulting from genomics and structural proteomics, benchmarking and application of those tools. Furthermore, this platform integrated the genomic scale sequence and structural analysis and prediction tools with the advanced structure prediction and bioinformatics environment of ICM. The development of GAP was primarily oriented towards the annotation of new biomolecular structures using both structural and sequence data. Even though the amount of protein X-ray crystal data is growing exponentially, the volume of sequence data is growing even moremore » rapidly. This trend was exploited by leveraging the wealth of sequence data to provide functional annotation for protein structures. The additional information provided by GAP is expected to assist the majority of the commercial users of ICM, who are involved in drug discovery, in identifying promising drug targets as well in devising strategies for the rational design of therapeutics directed at the protein of interest. The GAP also provided valuable tools for biochemistry education, and structural genomics centers. In addition, GAP incorporates many novel prediction and analysis methods not available in other molecular modeling packages. This development led to signing the first Molsoft agreement in the structural genomics annotation area with the University of oxford Structural Genomics Center. This commercial agreement validated the Molsoft efforts under the GAP project and provided the basis for further development of the large scale functional annotation platform.« less
Dynamics of Genome Rearrangement in Bacterial Populations

PubMed Central

Darling, Aaron E.; Miklós, István; Ragan, Mark A.

2008-01-01

Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes. PMID:18650965
Different effects of the TAR structure on HIV-1 and HIV-2 genomic RNA translation

PubMed Central

Soto-Rifo, Ricardo; Limousin, Taran; Rubilar, Paulina S.; Ricci, Emiliano P.; Décimo, Didier; Moncorgé, Olivier; Trabaud, Mary-Anne; André, Patrice; Cimarelli, Andrea; Ohlmann, Théophile

2012-01-01

The 5′-untranslated region (5′-UTR) of the genomic RNA of human immunodeficiency viruses type-1 (HIV-1) and type-2 (HIV-2) is composed of highly structured RNA motifs essential for viral replication that are expected to interfere with Gag and Gag-Pol translation. Here, we have analyzed and compared the properties by which the viral 5′-UTR drives translation from the genomic RNA of both human immunodeficiency viruses. Our results showed that translation from the HIV-2 gRNA was very poor compared to that of HIV-1. This was rather due to the intrinsic structural motifs in their respective 5′-UTR without involvement of any viral protein. Further investigation pointed to a different role of TAR RNA, which was much inhibitory for HIV-2 translation. Altogether, these data highlight important structural and functional differences between these two human pathogens. PMID:22121214
The mitochondrial genomes of Campodea fragilis and C. lubbocki(Hexapoda: Diplura): high genetic divergence in a morphologically uniformtaxon

DOE Office of Scientific and Technical Information (OSTI.GOV)

Podsiadlowski, L.; Carapelli, A.; Nardi, F.

2005-12-01

Mitochondrial genomes from two dipluran hexapods of the genus Campodea have been sequenced. Gene order is the same as in most other hexapods and crustaceans. Secondary structures of tRNAs reveal specific structural changes in tRNA-C, tRNA-R, tRNA-S1 and tRNA-S2. Comparative analyses of nucleotide and amino acid composition, as well as structural features of both ribosomal RNA subunits, reveal substantial differences among the analyzed taxa. Although the two Campodea species are morphologically highly uniform, genetic divergence is larger than expected, suggesting a long evolutionary history under stable ecological conditions.
GPCR-I-TASSER: A hybrid approach to G protein-coupled receptor structure modeling and the application to the human genome

PubMed Central

Zhang, Jian; Yang, Jianyi; Jang, Richard; Zhang, Yang

2015-01-01

SUMMARY Experimental structure determination remains very difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak-homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average C-alpha RMSD 2.4 Å in the TM-regions. The new hybrid protocol was applied to model all 1026 GPCRs in the human genome, where 923 have a high confidence score that are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins. PMID:26190572
Evolutionary dynamics of 3D genome architecture following polyploidization in cotton.

PubMed

Wang, Maojun; Wang, Pengcheng; Lin, Min; Ye, Zhengxiu; Li, Guoliang; Tu, Lili; Shen, Chao; Li, Jianying; Yang, Qingyong; Zhang, Xianlong

2018-02-01

The formation of polyploids significantly increases the complexity of transcriptional regulation, which is expected to be reflected in sophisticated higher-order chromatin structures. However, knowledge of three-dimensional (3D) genome structure and its dynamics during polyploidization remains poor. Here, we characterize 3D genome architectures for diploid and tetraploid cotton, and find the existence of A/B compartments and topologically associated domains (TADs). By comparing each subgenome in tetraploids with its extant diploid progenitor, we find that genome allopolyploidization has contributed to the switching of A/B compartments and the reorganization of TADs in both subgenomes. We also show that the formation of TAD boundaries during polyploidization preferentially occurs in open chromatin, coinciding with the deposition of active chromatin modification. Furthermore, analysis of inter-subgenomic chromatin interactions has revealed the spatial proximity of homoeologous genes, possibly associated with their coordinated expression. This study advances our understanding of chromatin organization in plants and sheds new light on the relationship between 3D genome evolution and transcriptional regulation.
Are there ergodic limits to evolution? Ergodic exploration of genome space and convergence

PubMed Central

McLeish, Tom C. B.

2015-01-01

We examine the analogy between evolutionary dynamics and statistical mechanics to include the fundamental question of ergodicity—the representative exploration of the space of possible states (in the case of evolution this is genome space). Several properties of evolutionary dynamics are identified that allow a generalization of the ergodic dynamics, familiar in dynamical systems theory, to evolution. Two classes of evolved biological structure then arise, differentiated by the qualitative duration of their evolutionary time scales. The first class has an ergodicity time scale (the time required for representative genome exploration) longer than available evolutionary time, and has incompletely explored the genotypic and phenotypic space of its possibilities. This case generates no expectation of convergence to an optimal phenotype or possibility of its prediction. The second, more interesting, class exhibits an evolutionary form of ergodicity—essentially all of the structural space within the constraints of slower evolutionary variables have been sampled; the ergodicity time scale for the system evolution is less than the evolutionary time. In this case, some convergence towards similar optima may be expected for equivalent systems in different species where both possess ergodic evolutionary dynamics. When the fitness maximum is set by physical, rather than co-evolved, constraints, it is additionally possible to make predictions of some properties of the evolved structures and systems. We propose four structures that emerge from evolution within genotypes whose fitness is induced from their phenotypes. Together, these result in an exponential speeding up of evolution, when compared with complete exploration of genomic space. We illustrate a possible case of application and a prediction of convergence together with attaining a physical fitness optimum in the case of invertebrate compound eye resolution. PMID:26640648
Are there ergodic limits to evolution? Ergodic exploration of genome space and convergence.

PubMed

McLeish, Tom C B

2015-12-06

We examine the analogy between evolutionary dynamics and statistical mechanics to include the fundamental question of ergodicity-the representative exploration of the space of possible states (in the case of evolution this is genome space). Several properties of evolutionary dynamics are identified that allow a generalization of the ergodic dynamics, familiar in dynamical systems theory, to evolution. Two classes of evolved biological structure then arise, differentiated by the qualitative duration of their evolutionary time scales. The first class has an ergodicity time scale (the time required for representative genome exploration) longer than available evolutionary time, and has incompletely explored the genotypic and phenotypic space of its possibilities. This case generates no expectation of convergence to an optimal phenotype or possibility of its prediction. The second, more interesting, class exhibits an evolutionary form of ergodicity-essentially all of the structural space within the constraints of slower evolutionary variables have been sampled; the ergodicity time scale for the system evolution is less than the evolutionary time. In this case, some convergence towards similar optima may be expected for equivalent systems in different species where both possess ergodic evolutionary dynamics. When the fitness maximum is set by physical, rather than co-evolved, constraints, it is additionally possible to make predictions of some properties of the evolved structures and systems. We propose four structures that emerge from evolution within genotypes whose fitness is induced from their phenotypes. Together, these result in an exponential speeding up of evolution, when compared with complete exploration of genomic space. We illustrate a possible case of application and a prediction of convergence together with attaining a physical fitness optimum in the case of invertebrate compound eye resolution.
GPCR-I-TASSER: A Hybrid Approach to G Protein-Coupled Receptor Structure Modeling and the Application to the Human Genome.

PubMed

Zhang, Jian; Yang, Jianyi; Jang, Richard; Zhang, Yang

2015-08-04

Experimental structure determination remains difficult for G protein-coupled receptors (GPCRs). We propose a new hybrid protocol to construct GPCR structure models that integrates experimental mutagenesis data with ab initio transmembrane (TM) helix assembly simulations. The method was tested on 24 known GPCRs where the ab initio TM-helix assembly procedure constructed the correct fold for 20 cases. When combined with weak homology and sparse mutagenesis restraints, the method generated correct folds for all the tested cases with an average Cα root-mean-square deviation 2.4 Å in the TM regions. The new hybrid protocol was applied to model all 1,026 GPCRs in the human genome, where 923 have a high confidence score and are expected to have correct folds; these contain many pharmaceutically important families with no previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin, and Neuropeptide Y receptors. The results demonstrate new progress on genome-wide structure modeling of TM proteins. Copyright © 2015 Elsevier Ltd. All rights reserved.
How gene order is influenced by the biophysics of transcription regulation

PubMed Central

Kolesov, Grigory; Wunderlich, Zeba; Laikova, Olga N.; Gelfand, Mikhail S.; Mirny, Leonid A.

2007-01-01

What are the forces that shape the structure of prokaryotic genomes: the order of genes, their proximity, and their orientation? Coregulation and coordinated horizontal gene transfer are believed to promote the proximity of functionally related genes and the formation of operons. However, forces that influence the structure of the genome beyond the level of a single operon remain unknown. Here, we show that the biophysical mechanism by which regulatory proteins search for their sites on DNA can impose constraints on genome structure. Using simulations, we demonstrate that rapid and reliable gene regulation requires that the transcription factor (TF) gene be close to the site on DNA the TF has to bind, thus promoting the colocalization of TF genes and their targets on the genome. We use parameters that have been measured in recent experiments to estimate the relevant length and times scales of this process and demonstrate that the search for a cognate site may be prohibitively slow if a TF has a low copy number and is not colocalized. We also analyze TFs and their sites in a number of bacterial genomes, confirm that they are colocalized significantly more often than expected, and show that this observation cannot be attributed to the pressure for coregulation or formation of selfish gene clusters, thus supporting the role of the biophysical constraint in shaping the structure of prokaryotic genomes. Our results demonstrate how spatial organization can influence timing and noise in gene expression. PMID:17709750
Extensive sequencing of seven human genomes to characterize benchmark reference materials

PubMed Central

Zook, Justin M.; Catoe, David; McDaniel, Jennifer; Vang, Lindsay; Spies, Noah; Sidow, Arend; Weng, Ziming; Liu, Yuling; Mason, Christopher E.; Alexander, Noah; Henaff, Elizabeth; McIntyre, Alexa B.R.; Chandramohan, Dhruva; Chen, Feng; Jaeger, Erich; Moshrefi, Ali; Pham, Khoa; Stedman, William; Liang, Tiffany; Saghbini, Michael; Dzakula, Zeljko; Hastie, Alex; Cao, Han; Deikus, Gintaras; Schadt, Eric; Sebra, Robert; Bashir, Ali; Truty, Rebecca M.; Chang, Christopher C.; Gulbahce, Natali; Zhao, Keyan; Ghosh, Srinka; Hyland, Fiona; Fu, Yutao; Chaisson, Mark; Xiao, Chunlin; Trow, Jonathan; Sherry, Stephen T.; Zaranek, Alexander W.; Ball, Madeleine; Bobe, Jason; Estep, Preston; Church, George M.; Marks, Patrick; Kyriazopoulou-Panagiotopoulou, Sofia; Zheng, Grace X.Y.; Schnall-Levin, Michael; Ordonez, Heather S.; Mudivarti, Patrice A.; Giorda, Kristina; Sheng, Ying; Rypdal, Karoline Bjarnesdatter; Salit, Marc

2016-01-01

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly. PMID:27271295
Clinical providers' experiences with returning results from genomic sequencing: an interview study.

PubMed

Wynn, Julia; Lewis, Katie; Amendola, Laura M; Bernhardt, Barbara A; Biswas, Sawona; Joshi, Manasi; McMullen, Carmit; Scollon, Sarah

2018-05-08

Current medical practice includes the application of genomic sequencing (GS) in clinical and research settings. Despite expanded use of this technology, the process of disclosure of genomic results to patients and research participants has not been thoroughly examined and there are no established best practices. We conducted semi-structured interviews with 21 genetic and non-genetic clinicians returning results of GS as part of the NIH funded Clinical Sequencing Exploratory Research (CSER) Consortium projects. Interviews focused on the logistics of sessions, participant/patient reactions and factors influencing them, how the sessions changed with experience, and resources and training recommended to return genomic results. The length of preparation and disclosure sessions varied depending on the type and number of results and their implications. Internal and external databases, online resources and result review meetings were used to prepare. Respondents reported that participants' reactions were variable and ranged from enthusiasm and relief to confusion and disappointment. Factors influencing reactions were types of results, expectations and health status. A recurrent challenge was managing inflated expectations about GS. Other challenges included returning multiple, unanticipated and/or uncertain results and navigating a rare diagnosis. Methods to address these challenges included traditional genetic counseling techniques and modifying practice over time in order to provide anticipatory guidance and modulate expectations. Respondents made recommendations to improve access to genomic resources and genetic referrals to prepare future providers as the uptake of GS increases in both genetic and non-genetic settings. These findings indicate that returning genomic results is similar to return of results in traditional genetic testing but is magnified by the additional complexity and potential uncertainty of the results. Managing patient expectations, initially identified in studies of informed consent, remains an ongoing challenge and highlights the need to address this issue throughout the testing process. The results of this study will help to guide future providers in the disclosure of genomic results and highlight educational needs and resources necessary to prepare providers. Future research on the patient experience, understanding and follow-up of recommendations is needed to more fully understand the disclosure process.
Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map

PubMed Central

Li, Ximei; Jin, Xin; Wang, Hantao; Zhang, Xianlong; Lin, Zhongxu

2016-01-01

A high-density linkage map was constructed using 1,885 newly obtained loci and 3,747 previously published loci, which included 5,152 loci with 4696.03 cM in total length and 0.91 cM in mean distance. Homology analysis in the cotton genome further confirmed the 13 expected homologous chromosome pairs and revealed an obvious inversion on Chr10 or Chr20 and repeated inversions on Chr07 or Chr16. In addition, two reciprocal translocations between Chr02 and Chr03 and between Chr04 and Chr05 were confirmed. Comparative genomics between the tetraploid cotton and the diploid cottons showed that no major structural changes exist between DT and D chromosomes but rather between AT and A chromosomes. Blast analysis between the tetraploid cotton genome and the mixed genome of two diploid cottons showed that most AD chromosomes, regardless of whether it is from the AT or DT genome, preferentially matched with the corresponding homologous chromosome in the diploid A genome, and then the corresponding homologous chromosome in the diploid D genome, indicating that the diploid D genome underwent converted evolution by the diploid A genome to form the DT genome during polyploidization. In addition, the results reflected that a series of chromosomal translocations occurred among Chr01/Chr15, Chr02/Chr14, Chr03/Chr17, Chr04/Chr22, and Chr05/Chr19. PMID:27084896

Great expectations: patient perspectives and anticipated utility of non-diagnostic genomic-sequencing results.

PubMed

Hylind, Robyn; Smith, Maureen; Rasmussen-Torvik, Laura; Aufox, Sharon

2018-01-01

The management of secondary findings is a challenge to health-care providers relaying clinical genomic-sequencing results to patients. Understanding patients' expectations from non-diagnostic genomic sequencing could help guide this management. This study interviewed 14 individuals enrolled in the eMERGE (Electronic Medical Records and Genomics) study. Participants in eMERGE consent to undergo non-diagnostic genomic sequencing, receive results, and have results returned to their physicians. The interviews assessed expectations and intended use of results. The majority of interviewees were male (64%) and 43% identified as non-Caucasian. A unique theme identified was that many participants expressed uncertainty about the type of diseases they expected to receive results on, what results they wanted to learn about, and how they intended to use results. Participant uncertainty highlights the complex nature of deciding to undergo genomic testing and a deficiency in genomic knowledge. These results could help improve how genomic sequencing and secondary findings are discussed with patients.
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics.

PubMed

Ragothaman, Anjani; Boddu, Sairam Chowdary; Kim, Nayong; Feinstein, Wei; Brylinski, Michal; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread--a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

PubMed Central

Ragothaman, Anjani; Feinstein, Wei; Jha, Shantenu; Kim, Joohyun

2014-01-01

While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. PMID:24995285
Is Whole Exome Sequencing an Ethically Disruptive Technology? Perspectives of Pediatric Oncologists and Parents of Pediatric Patients with Solid Tumors

PubMed Central

McCullough, Laurence B.; Slashinski, Melody J.; McGuire, Amy L.; Street, Richard L.; Eng, Christine M.; Gibbs, Richard A.; Parsons, D. Williams; Plon, Sharon E.

2016-01-01

Background Some anticipate that physician and parents will be ill-prepared or unprepared for the clinical introduction of genome sequencing, making it ethically disruptive. Procedure As part of the Baylor Advancing Sequencing in Childhood Cancer Care (BASIC3) study, we conducted semi-structured interviews with 16 pediatric oncologists and 40 parents of pediatric patients with cancer prior to the return of sequencing results. We elicited expectations and attitudes concerning the impact of sequencing on clinical decision-making, clinical utility, and treatment expectations from both groups. Using accepted methods of qualitative research to analyze interview transcripts, we completed a thematic analysis to provide inductive insights into their views of sequencing. Results Our major findings reveal that neither pediatric oncologists nor parents anticipate sequencing to be an ethically disruptive technology, because they expect to be prepared to integrate sequencing results into their existing approaches to learning and using new clinical information for care. Pediatric oncologists do not expect sequencing results to be more complex than other diagnostic information and plan simply to incorporate these data into their evidence-based approach to clinical practice although they were concerned about impact on parents. For parents, there is an urgency to protect their chil's health and in this context they expect genomic information to better prepare them to participate in decisions about their chil's care. Conclusion Our data do not support concern that introducing genome sequencing into childhood cancer care will be ethically disruptive, i.e., leave physicians or parents ill-prepared or unprepared to make responsible decisions about patient care. PMID:26505993
Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map.

PubMed

Li, Ximei; Jin, Xin; Wang, Hantao; Zhang, Xianlong; Lin, Zhongxu

2016-06-01

A high-density linkage map was constructed using 1,885 newly obtained loci and 3,747 previously published loci, which included 5,152 loci with 4696.03 cM in total length and 0.91 cM in mean distance. Homology analysis in the cotton genome further confirmed the 13 expected homologous chromosome pairs and revealed an obvious inversion on Chr10 or Chr20 and repeated inversions on Chr07 or Chr16. In addition, two reciprocal translocations between Chr02 and Chr03 and between Chr04 and Chr05 were confirmed. Comparative genomics between the tetraploid cotton and the diploid cottons showed that no major structural changes exist between DT and D chromosomes but rather between AT and A chromosomes. Blast analysis between the tetraploid cotton genome and the mixed genome of two diploid cottons showed that most AD chromosomes, regardless of whether it is from the AT or DT genome, preferentially matched with the corresponding homologous chromosome in the diploid A genome, and then the corresponding homologous chromosome in the diploid D genome, indicating that the diploid D genome underwent converted evolution by the diploid A genome to form the DT genome during polyploidization. In addition, the results reflected that a series of chromosomal translocations occurred among Chr01/Chr15, Chr02/Chr14, Chr03/Chr17, Chr04/Chr22, and Chr05/Chr19. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Using Full Genomic Information to Predict Disease: Breaking Down the Barriers Between Complex and Mendelian Diseases.

PubMed

Jordan, Daniel M; Do, Ron

2018-04-11

While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 19 is August 31, 2018. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Is junk DNA bunk? A critique of ENCODE.

PubMed

Doolittle, W Ford

2013-04-02

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE's ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE's definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed.
Is junk DNA bunk? A critique of ENCODE

PubMed Central

Doolittle, W. Ford

2013-01-01

Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete? Here, I review older arguments for junk grounded in the C-value paradox and propose a thought experiment to challenge ENCODE’s ontology. Specifically, what would we expect for the number of functional elements (as ENCODE defines them) in genomes much larger than our own genome? If the number were to stay more or less constant, it would seem sensible to consider the rest of the DNA of larger genomes to be junk or, at least, assign it a different sort of role (structural rather than informational). If, however, the number of functional elements were to rise significantly with C-value then, (i) organisms with genomes larger than our genome are more complex phenotypically than we are, (ii) ENCODE’s definition of functional element identifies many sites that would not be considered functional or phenotype-determining by standard uses in biology, or (iii) the same phenotypic functions are often determined in a more diffuse fashion in larger-genomed organisms. Good cases can be made for propositions ii and iii. A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed. PMID:23479647
Structure of a Trypanosoma Brucei Alpha/Beta--Hydrolase Fold Protein With Unknown Function

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merritt, E.A.; Holmes, M.; Buckner, F.S.

2009-05-26

The structure of a structural genomics target protein, Tbru020260AAA from Trypanosoma brucei, has been determined to a resolution of 2.2 {angstrom} using multiple-wavelength anomalous diffraction at the Se K edge. This protein belongs to Pfam sequence family PF08538 and is only distantly related to previously studied members of the {alpha}/{beta}-hydrolase fold family. Structural superposition onto representative {alpha}/{beta}-hydrolase fold proteins of known function indicates that a possible catalytic nucleophile, Ser116 in the T. brucei protein, lies at the expected location. However, the present structure and by extension the other trypanosomatid members of this sequence family have neither sequence nor structural similaritymore » at the location of other active-site residues typical for proteins with this fold. Together with the presence of an additional domain between strands {beta}6 and {beta}7 that is conserved in trypanosomatid genomes, this suggests that the function of these homologs has diverged from other members of the fold family.« less
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.

PubMed

Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping

2016-01-01

The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species

PubMed Central

Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo

2016-01-01

The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527
Deconstruction of the (Paleo)Polyploid Grapevine Genome Based on the Analysis of Transposition Events Involving NBS Resistance Genes

PubMed Central

Cestaro, Alessandro; Sterck, Lieven; Fontana, Paolo; Van de Peer, Yves; Viola, Roberto; Velasco, Riccardo; Salamini, Francesco

2012-01-01

Plants have followed a reticulate type of evolution and taxa have frequently merged via allopolyploidization. A polyploid structure of sequenced genomes has often been proposed, but the chromosomes belonging to putative component genomes are difficult to identify. The 19 grapevine chromosomes are evolutionary stable structures: their homologous triplets have strongly conserved gene order, interrupted by rare translocations. The aim of this study is to examine how the grapevine nucleotide-binding site (NBS)-encoding resistance (NBS-R) genes have evolved in the genomic context and to understand mechanisms for the genome evolution. We show that, in grapevine, i) helitrons have significantly contributed to transposition of NBS-R genes, and ii) NBS-R gene cluster similarity indicates the existence of two groups of chromosomes (named as Va and Vc) that may have evolved independently. Chromosome triplets consist of two Va and one Vc chromosomes, as expected from the tetraploid and diploid conditions of the two component genomes. The hexaploid state could have been derived from either allopolyploidy or the separation of the Va and Vc component genomes in the same nucleus before fusion, as known for Rosaceae species. Time estimation indicates that grapevine component genomes may have fused about 60 mya, having had at least 40–60 mya to evolve independently. Chromosome number variation in the Vitaceae and related families, and the gap between the time of eudicot radiation and the age of Vitaceae fossils, are accounted for by our hypothesis. PMID:22253773
Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard

PubMed Central

Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver

2011-01-01

In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864
Expanding the proteome: disordered and alternatively-folded proteins

PubMed Central

Dyson, H. Jane

2011-01-01

Proteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well-structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question “why would a particular domain need to be unstructured?” are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but of partially structured and highly dynamic members of the disorder-order continuum. PMID:21729349
Alternative Splice Variants in TIM Barrel Proteins from Human Genome Correlate with the Structural and Evolutionary Modularity of this Versatile Protein Fold

PubMed Central

Ochoa-Leyva, Adrián; Montero-Morán, Gabriela; Saab-Rincón, Gloria; Brieba, Luis G.; Soberón, Xavier

2013-01-01

After the surprisingly low number of genes identified in the human genome, alternative splicing emerged as a major mechanism to generate protein diversity in higher eukaryotes. However, it is still not known if its prevalence along the genome evolution has contributed to the overall functional protein diversity or if it simply reflects splicing noise. The (βα)8 barrel or TIM barrel is one of the most frequent, versatile, and ancient fold encountered among enzymes. Here, we analyze the structural modifications present in TIM barrel proteins from the human genome product of alternative splicing events. We found that 87% of all splicing events involved deletions; most of these events resulted in protein fragments that corresponded to the (βα)2, (βα)4, (βα)5, (βα)6, and (βα)7 subdomains of TIM barrels. Because approximately 7% of all the splicing events involved internal β-strand substitutions, we decided, based on the genomic data, to design β-strand and α-helix substitutions in a well-studied TIM barrel enzyme. The biochemical characterization of one of the chimeric variants suggests that some of the splice variants in the human genome with β-strand substitutions may be evolving novel functions via either the oligomeric state or substrate specificity. We provide results of how the splice variants represent subdomains that correlate with the independently folding and evolving structural units previously reported. This work is the first to observe a link between the structural features of the barrel and a recurrent genetic mechanism. Our results suggest that it is reasonable to expect that a sizeable fraction of splice variants found in the human genome represent structurally viable functional proteins. Our data provide additional support for the hypothesis of the origin of the TIM barrel fold through the assembly of smaller subdomains. We suggest a model of how nature explores new proteins through alternative splicing as a mechanism to diversify the proteins encoded in the human genome. PMID:23950966
Precise detection of chromosomal translocation or inversion breakpoints by whole-genome sequencing.

PubMed

Suzuki, Toshifumi; Tsurusaki, Yoshinori; Nakashima, Mitsuko; Miyake, Noriko; Saitsu, Hirotomo; Takeda, Satoru; Matsumoto, Naomichi

2014-12-01

Structural variations (SVs), including translocations, inversions, deletions and duplications, are potentially associated with Mendelian diseases and contiguous gene syndromes. Determination of SV-related breakpoints at the nucleotide level is important to reveal the genetic causes for diseases. Whole-genome sequencing (WGS) by next-generation sequencers is expected to determine structural abnormalities more directly and efficiently than conventional methods. In this study, 14 SVs (9 balanced translocations, 1 inversion and 4 microdeletions) in 9 patients were analyzed by WGS with a shallow (5 × ) to moderate read coverage (20 × ). Among 28 breakpoints (as each SV has two breakpoints), 19 SV breakpoints had been determined previously at the nucleotide level by any other methods and 9 were uncharacterized. BreakDancer and Integrative Genomics Viewer determined 20 breakpoints (16 translocation, 2 inversion and 2 deletion breakpoints), but did not detect 8 breakpoints (2 translocation and 6 deletion breakpoints). These data indicate the efficacy of WGS for the precise determination of translocation and inversion breakpoints.
Constructing failure in big biology: The socio-technical anatomy of Japan's Protein 3000 Project.

PubMed

Fukushima, Masato

2016-02-01

This study focuses on the 5-year Protein 3000 Project launched in 2002, the largest biological project in Japan. The project aimed to overcome Japan's alleged failure to contribute fully to the Human Genome Project, by determining 3000 protein structures, 30 percent of the global target. Despite its achievement of this goal, the project was fiercely criticized in various sectors of society and was often branded an awkward failure. This article tries to solve the mystery of why such failure discourse was prevalent. Three explanatory factors are offered: first, because some goals were excluded during project development, there was a dynamic of failed expectations; second, structural genomics, while promoting collaboration with the international community, became an 'anti-boundary object', only the absence of which bound heterogeneous domestic actors; third, there developed an urgent sense of international competition in order to obtain patents on such structural information.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups.

PubMed

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F; Espinoza-Rojas, Daniela A; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G; Figueroa, Jaime E; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis , functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection.
Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

PubMed Central

Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F.; Espinoza-Rojas, Daniela A.; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G.; Figueroa, Jaime E.; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J.

2017-01-01

Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be directly associated with inter-genogroup differences in pathogenesis and host-pathogen interactions, information that could be useful in designing novel strategies for diagnosing and controlling P. salmonis infection. PMID:29164068
Genetic Structure of the Han Chinese Population Revealed by Genome-wide SNP Variation

PubMed Central

Chen, Jieming; Zheng, Houfeng; Bei, Jin-Xin; Sun, Liangdan; Jia, Wei-hua; Li, Tao; Zhang, Furen; Seielstad, Mark; Zeng, Yi-Xin; Zhang, Xuejun; Liu, Jianjun

2009-01-01

Population stratification is a potential problem for genome-wide association studies (GWAS), confounding results and causing spurious associations. Hence, understanding how allele frequencies vary across geographic regions or among subpopulations is an important prelude to analyzing GWAS data. Using over 350,000 genome-wide autosomal SNPs in over 6000 Han Chinese samples from ten provinces of China, our study revealed a one-dimensional “north-south” population structure and a close correlation between geography and the genetic structure of the Han Chinese. The north-south population structure is consistent with the historical migration pattern of the Han Chinese population. Metropolitan cities in China were, however, more diffused “outliers,” probably because of the impact of modern migration of peoples. At a very local scale within the Guangdong province, we observed evidence of population structure among dialect groups, probably on account of endogamy within these dialects. Via simulation, we show that empirical levels of population structure observed across modern China can cause spurious associations in GWAS if not properly handled. In the Han Chinese, geographic matching is a good proxy for genetic matching, particularly in validation and candidate-gene studies in which population stratification cannot be directly accessed and accounted for because of the lack of genome-wide data, with the exception of the metropolitan cities, where geographical location is no longer a good indicator of ancestral origin. Our findings are important for designing GWAS in the Chinese population, an activity that is expected to intensify greatly in the near future. PMID:19944401

Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

PubMed

Walker, Michael B; King, Benjamin L; Paigen, Kenneth

2012-01-01

Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.
Complete mitochondrial genome sequence of Urechis caupo, a representative of the phylum Echiura

PubMed Central

Boore, Jeffrey L

2004-01-01

Background Mitochondria contain small genomes that are physically separate from those of nuclei. Their comparison serves as a model system for understanding the processes of genome evolution. Although hundreds of these genome sequences have been reported, the taxonomic sampling is highly biased toward vertebrates and arthropods, with many whole phyla remaining unstudied. This is the first description of a complete mitochondrial genome sequence of a representative of the phylum Echiura, that of the fat innkeeper worm, Urechis caupo. Results This mtDNA is 15,113 nts in length and 62% A+T. It contains the 37 genes that are typical for animal mtDNAs in an arrangement somewhat similar to that of annelid worms. All genes are encoded by the same DNA strand which is rich in A and C relative to the opposite strand. Codons ending with the dinucleotide GG are more frequent than would be expected from apparent mutational biases. The largest non-coding region is only 282 nts long, is 71% A+T, and has potential for secondary structures. Conclusions Urechis caupo mtDNA shares many features with those of the few studied annelids, including the common usage of ATG start codons, unusual among animal mtDNAs, as well as gene arrangements, tRNA structures, and codon usage biases. PMID:15369601
Whole genome sequencing of the monomorphic pathogen Mycobacterium bovis reveals local differentiation of cattle clinical isolates.

PubMed

Lasserre, Moira; Fresia, Pablo; Greif, Gonzalo; Iraola, Gregorio; Castro-Ramos, Miguel; Juambeltz, Arturo; Nuñez, Álvaro; Naya, Hugo; Robello, Carlos; Berná, Luisa

2018-01-02

Bovine tuberculosis (bTB) poses serious risks to animal welfare and economy, as well as to public health as a zoonosis. Its etiological agent, Mycobacterium bovis, belongs to the Mycobacterium tuberculosis complex (MTBC), a group of genetically monomorphic organisms featured by a remarkably high overall nucleotide identity (99.9%). Indeed, this characteristic is of major concern for correct typing and determination of strain-specific traits based on sequence diversity. Due to its historical economic dependence on cattle production, Uruguay is deeply affected by the prevailing incidence of Mycobacterium bovis. With the world's highest number of cattle per human, and its intensive cattle production, Uruguay represents a particularly suited setting to evaluate genomic variability among isolates, and the diversity traits associated to this pathogen. We compared 186 genomes from MTBC strains isolated worldwide, and found a highly structured population in M. bovis. The analysis of 23 new M. bovis genomes, belonging to strains isolated in Uruguay evidenced three groups present in the country. Despite presenting an expected highly conserved genomic structure and sequence, these strains segregate into a clustered manner within the worldwide phylogeny. Analysis of the non-pe/ppe differential areas against a reference genome defined four main sources of variability, namely: regions of difference (RD), variable genes, duplications and novel genes. RDs and variant analysis segregated the strains into clusters that are concordant with their spoligotype identities. Due to its high homoplasy rate, spoligotyping failed to reflect the true genomic diversity among worldwide representative strains, however, it remains a good indicator for closely related populations. This study introduces a comprehensive population structure analysis of worldwide M. bovis isolates. The incorporation and analysis of 23 novel Uruguayan M. bovis genomes, sheds light onto the genomic diversity of this pathogen, evidencing the existence of greater genetic variability among strains than previously contemplated.
Nullomers and High Order Nullomers in Genomic Sequences

PubMed Central

Vergni, Davide; Santoni, Daniele

2016-01-01

A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971
Understanding patient and provider perceptions and expectations of genomic medicine

PubMed Central

Hall, Michael J; Forman, Andrea; Montgomery, Susan; Rainey, Kim; Daly, Mary B

2014-01-01

Advances in genome sequencing technology have fostered a new era of clinical genomic medicine. Genetic counselors, who have begun to support patients undergoing multi-gene panel testing for hereditary cancer risk, will review brief clinical vignettes, and discuss early experiences with clinical genomic testing. Their experiences will frame a discussion about how current testing may challenge patient understanding and expectations toward the evaluation of cancer risk and downstream preventive behaviors. PMID:24992205
Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins.

PubMed

Eriksson, Anders; Manica, Andrea

2012-08-28

Recent comparisons between anatomically modern humans and ancient genomes of other hominins have raised the tantalizing, and hotly debated, possibility of hybridization. Although several tests of hybridization have been devised, they all rely on the degree to which different modern populations share genetic polymorphisms with the ancient genomes of other hominins. However, spatial population structure is expected to generate genetic patterns similar to those that might be attributed to hybridization. To investigate this problem, we take Neanderthals as a case study, and build a spatially explicit model of the shared history of anatomically modern humans and this hominin. We show that the excess polymorphism shared between Eurasians and Neanderthals is compatible with scenarios in which no hybridization occurred, and is strongly linked to the strength of population structure in ancient populations. Thus, we recommend caution in inferring admixture from geographic patterns of shared polymorphisms, and argue that future attempts to investigate ancient hybridization between humans and other hominins should explicitly account for population structure.
Genome re-assignment of Arachis trinitensis (Sect. Arachis, Leguminosae) and its implications for the genetic origin of cultivated peanut

PubMed Central

2010-01-01

The karyotype structure of Arachis trinitensis was studied by conventional Feulgen staining, CMA/DAPI banding and rDNA loci detection by fluorescence in situ hybridization (FISH) in order to establish its genome status and test the hypothesis that this species is a genome donor of cultivated peanut. Conventional staining revealed that the karyotype lacked the small “A chromosomes” characteristic of the A genome. In agreement with this, chromosomal banding showed that none of the chromosomes had the large centromeric bands expected for A chromosomes. FISH revealed one pair each of 5S and 45S rDNA loci, located in different medium-sized metacentric chromosomes. Collectively, these results suggest that A. trinitensis should be removed from the A genome and be considered as a B or non-A genome species. The pattern of heterochromatic bands and rDNA loci of A. trinitensis differ markedly from any of the complements of A. hypogaea, suggesting that the former species is unlikely to be one of the wild diploid progenitors of the latter. PMID:21637581
Understanding patient and provider perceptions and expectations of genomic medicine.

PubMed

Hall, Michael J; Forman, Andrea D; Montgomery, Susan V; Rainey, Kim L; Daly, Mary B

2015-01-01

Advances in genome sequencing technology have fostered a new era of clinical genomic medicine. Genetic counselors, who have begun to support patients undergoing multi-gene panel testing for hereditary cancer risk, will review brief clinical vignettes, and discuss early experiences with clinical genomic testing. Their experiences will frame a discussion about how current testing may challenge patient understanding and expectations toward the evaluation of cancer risk and downstream preventive behaviors. © 2014 Wiley Periodicals, Inc.
Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny

PubMed Central

Wang, Yuan; Chen, Jing; Jiang, Li-Yun; Qiao, Ge-Xia

2015-01-01

The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource. PMID:26039239
Closing the gap between knowledge and clinical application: challenges for genomic translation.

PubMed

Burke, Wylie; Korngiebel, Diane M

2015-01-01

Despite early predictions and rapid progress in research, the introduction of personal genomics into clinical practice has been slow. Several factors contribute to this translational gap between knowledge and clinical application. The evidence available to support genetic test use is often limited, and implementation of new testing programs can be challenging. In addition, the heterogeneity of genomic risk information points to the need for strategies to select and deliver the information most appropriate for particular clinical needs. Accomplishing these tasks also requires recognition that some expectations for personal genomics are unrealistic, notably expectations concerning the clinical utility of genomic risk assessment for common complex diseases. Efforts are needed to improve the body of evidence addressing clinical outcomes for genomics, apply implementation science to personal genomics, and develop realistic goals for genomic risk assessment. In addition, translational research should emphasize the broader benefits of genomic knowledge, including applications of genomic research that provide clinical benefit outside the context of personal genomic risk.
Genomic understanding of dinoflagellates.

PubMed

Lin, Senjie

2011-01-01

The phylum of dinoflagellates is characterized by many unusual and interesting genomic and physiological features, the imprint of which, in its immense genome, remains elusive. Much novel understanding has been achieved in the last decade on various aspects of dinoflagellate biology, but most remarkably about the structure, expression pattern and epigenetic modification of protein-coding genes in the nuclear and organellar genomes. Major findings include: 1) the great diversity of dinoflagellates, especially at the base of the dinoflagellate tree of life; 2) mini-circularization of the genomes of typical dinoflagellate plastids (with three membranes, chlorophylls a, c1 and c2, and carotenoid peridinin), the scrambled mitochondrial genome and the extensive mRNA editing occurring in both systems; 3) ubiquitous spliced leader trans-splicing of nuclear-encoded mRNA and demonstrated potential as a novel tool for studying dinoflagellate transcriptomes in mixed cultures and natural assemblages; 4) existence and expression of histones and other nucleosomal proteins; 5) a ribosomal protein set expected of typical eukaryotes; 6) genetic potential of non-photosynthetic solar energy utilization via proton-pump rhodopsin; 7) gene candidates in the toxin synthesis pathways; and 8) evidence of a highly redundant, high gene number and highly recombined genome. Despite this progress, much more work awaits genome-wide transcriptome and whole genome sequencing in order to unfold the molecular mechanisms underlying the numerous mysterious attributes of dinoflagellates. Copyright © 2011 Institut Pasteur. Published by Elsevier SAS. All rights reserved.
Enhancing genomic laboratory reports from the patients' view: A qualitative analysis.

PubMed

Stuckey, Heather; Williams, Janet L; Fan, Audrey L; Rahm, Alanna Kulchak; Green, Jamie; Feldman, Lynn; Bonhag, Michele; Zallen, Doris T; Segal, Michael M; Williams, Marc S

2015-10-01

The purpose of this study was to develop a family genomic laboratory report designed to communicate genome sequencing results to parents of children who were participating in a whole genome sequencing clinical research study. Semi-structured interviews were conducted with parents of children who participated in a whole genome sequencing clinical research study to address the elements, language and format of a sample family-directed genome laboratory report. The qualitative interviews were followed by two focus groups aimed at evaluating example presentations of information about prognosis and next steps related to the whole genome sequencing result. Three themes emerged from the qualitative data: (i) Parents described a continual search for valid information and resources regarding their child's condition, a need that prior reports did not meet for parents; (ii) Parents believed that the Family Report would help facilitate communication with physicians and family members; and (iii) Parents identified specific items they appreciated in a genomics Family Report: simplicity of language, logical flow, visual appeal, information on what to expect in the future and recommended next steps. Parents affirmed their desire for a family genomic results report designed for their use and reference. They articulated the need for clear, easy to understand language that provided information with temporal detail and specific recommendations regarding relevant findings consistent with that available to clinicians. © 2015 Wiley Periodicals, Inc.
Enhancing genomic laboratory reports from the patients' view: A qualitative analysis

PubMed Central

Stuckey, Heather; Fan, Audrey L.; Rahm, Alanna Kulchak; Green, Jamie; Feldman, Lynn; Bonhag, Michele; Zallen, Doris T.; Segal, Michael M.; Williams, Marc S.

2015-01-01

The purpose of this study was to develop a family genomic laboratory report designed to communicate genome sequencing results to parents of children who were participating in a whole genome sequencing clinical research study. Semi‐structured interviews were conducted with parents of children who participated in a whole genome sequencing clinical research study to address the elements, language and format of a sample family‐directed genome laboratory report. The qualitative interviews were followed by two focus groups aimed at evaluating example presentations of information about prognosis and next steps related to the whole genome sequencing result. Three themes emerged from the qualitative data: (i) Parents described a continual search for valid information and resources regarding their child's condition, a need that prior reports did not meet for parents; (ii) Parents believed that the Family Report would help facilitate communication with physicians and family members; and (iii) Parents identified specific items they appreciated in a genomics Family Report: simplicity of language, logical flow, visual appeal, information on what to expect in the future and recommended next steps. Parents affirmed their desire for a family genomic results report designed for their use and reference. They articulated the need for clear, easy to understand language that provided information with temporal detail and specific recommendations regarding relevant findings consistent with that available to clinicians. PMID:26086630
The PiGeOn project: protocol of a longitudinal study examining psychosocial and ethical issues and outcomes in germline genomic sequencing for cancer.

PubMed

Best, Megan; Newson, Ainsley J; Meiser, Bettina; Juraskova, Ilona; Goldstein, David; Tucker, Kathy; Ballinger, Mandy L; Hess, Dominique; Schlub, Timothy E; Biesecker, Barbara; Vines, Richard; Vines, Kate; Thomas, David; Young, Mary-Anne; Savard, Jacqueline; Jacobs, Chris; Butow, Phyllis

2018-04-23

Advances in genomics offer promise for earlier detection or prevention of cancer, by personalisation of medical care tailored to an individual's genomic risk status. However genome sequencing can generate an unprecedented volume of results for the patient to process with potential implications for their families and reproductive choices. This paper describes a protocol for a study (PiGeOn) that aims to explore how patients and their blood relatives experience germline genomic sequencing, to help guide the appropriate future implementation of genome sequencing into routine clinical practice. We have designed a mixed-methods, prospective, cohort sub-study of a germline genomic sequencing study that targets adults with cancer suggestive of a genetic aetiology. One thousand probands and 2000 of their blood relatives will undergo germline genomic sequencing as part of the parent study in Sydney, Australia between 2016 and 2020. Test results are expected within12-15 months of recruitment. For the PiGeOn sub-study, participants will be invited to complete surveys at baseline, three months and twelve months after baseline using self-administered questionnaires, to assess the experience of long waits for results (despite being informed that results may not be returned) and expectations of receiving them. Subsets of both probands and blood relatives will be purposively sampled and invited to participate in three semi-structured qualitative interviews (at baseline and each follow-up) to triangulate the data. Ethical themes identified in the data will be used to inform critical revisions of normative ethical concepts or frameworks. This will be one of the first studies internationally to follow the psychosocial impact on probands and their blood relatives who undergo germline genome sequencing, over time. Study results will inform ongoing ethical debates on issues such as informed consent for genomic sequencing, and informing participants and their relatives of specific results. The study will also provide important outcome data concerning the psychological impact of prolonged waiting for germline genomic sequencing. These data are needed to ensure that when germline genomic sequencing is introduced into standard clinical settings, ethical concepts are embedded, and patients and their relatives are adequately prepared and supported during and after the testing process.
Protein Chaperones Q8ZP25_SALTY from Salmonella Typhimurium and HYAE_ECOLI from Escherichia coli Exhibit Thioredoxin-like Structures Despite Lack of Canonical Thioredoxin Active Site Sequence Motif

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parish, D.; Benach, J; Liu, G

2008-01-01

The structure of the 142-residue protein Q8ZP25 SALTY encoded in the genome of Salmonella typhimurium LT2 was determined independently by NMR and X-ray crystallography, and the structure of the 140-residue protein HYAE ECOLI encoded in the genome of Escherichia coli was determined by NMR. The two proteins belong to Pfam (Finn et al. 34:D247-D251, 2006) PF07449, which currently comprises 50 members, and belongs itself to the 'thioredoxin-like clan'. However, protein HYAE ECOLI and the other proteins of Pfam PF07449 do not contain the canonical Cys-X-X-Cys active site sequence motif of thioredoxin. Protein HYAE ECOLI was previously classified as a (NiFe)more » hydrogenase-1 specific chaperone interacting with the twin-arginine translocation (Tat) signal peptide. The structures presented here exhibit the expected thioredoxin-like fold and support the view that members of Pfam family PF07449 specifically interact with Tat signal peptides.« less
Rapid diversification of five Oryza AA genomes associated with rice adaptation.

PubMed

Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L; Gao, Li-Zhi

2014-11-18

Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm.
Rapid diversification of five Oryza AA genomes associated with rice adaptation

PubMed Central

Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L.; Gao, Li-Zhi

2014-01-01

Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm. PMID:25368197
Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

PubMed

de Beer, Tjaart A P; Laskowski, Roman A; Parks, Sarah L; Sipos, Botond; Goldman, Nick; Thornton, Janet M

2013-01-01

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.
Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer.

PubMed

Covarrubias-Pazaran, Giovanny

2016-01-01

Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.
Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures

PubMed Central

Bechtel, Jason M; Wittenschlaeger, Thomas; Dwyer, Trisha; Song, Jun; Arunachalam, Sasi; Ramakrishnan, Sadeesh K; Shepard, Samuel; Fedorov, Alexei

2008-01-01

Background Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS) plays in gene expression. Results We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt) not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs) were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS (< -25 kcal/mol) was a factor of two to ten greater than a random expectation model. The randomization process preserves the short-range inhomogeneity of the corresponding natural sequences, thus, eliminating short-range signals as possible contributors to any observed phenomena. Conclusion We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI). MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI. PMID:18549495

Diversity and Evolution of Mycobacterium tuberculosis: Moving to Whole-Genome-Based Approaches

PubMed Central

Niemann, Stefan; Supply, Philip

2014-01-01

Genotyping of clinical Mycobacterium tuberculosis complex (MTBC) strains has become a standard tool for epidemiological tracing and for the investigation of the local and global strain population structure. Of special importance is the analysis of the expansion of multidrug (MDR) and extensively drug-resistant (XDR) strains. Classical genotyping and, more recently, whole-genome sequencing have revealed that the strains of the MTBC are more diverse than previously anticipated. Globally, several phylogenetic lineages can be distinguished whose geographical distribution is markedly variable. Strains of particular (sub)lineages, such as Beijing, seem to be more virulent and associated with enhanced resistance levels and fitness, likely fueling their spread in certain world regions. The upcoming generalization of whole-genome sequencing approaches will expectedly provide more comprehensive insights into the molecular and epidemiological mechanisms involved and lead to better diagnostic and therapeutic tools. PMID:25190252
A protocol for generating a high-quality genome-scale metabolic reconstruction.

PubMed

Thiele, Ines; Palsson, Bernhard Ø

2010-01-01

Network reconstructions are a common denominator in systems biology. Bottom-up metabolic network reconstructions have been developed over the last 10 years. These reconstructions represent structured knowledge bases that abstract pertinent information on the biochemical transformations taking place within specific target organisms. The conversion of a reconstruction into a mathematical format facilitates a myriad of computational biological studies, including evaluation of network content, hypothesis testing and generation, analysis of phenotypic characteristics and metabolic engineering. To date, genome-scale metabolic reconstructions for more than 30 organisms have been published and this number is expected to increase rapidly. However, these reconstructions differ in quality and coverage that may minimize their predictive potential and use as knowledge bases. Here we present a comprehensive protocol describing each step necessary to build a high-quality genome-scale metabolic reconstruction, as well as the common trials and tribulations. Therefore, this protocol provides a helpful manual for all stages of the reconstruction process.
A protocol for generating a high-quality genome-scale metabolic reconstruction

PubMed Central

Thiele, Ines; Palsson, Bernhard Ø.

2011-01-01

Network reconstructions are a common denominator in systems biology. Bottom-up metabolic network reconstructions have developed over the past 10 years. These reconstructions represent structured knowledge-bases that abstract pertinent information on the biochemical transformations taking place within specific target organisms. The conversion of a reconstruction into a mathematical format facilitates myriad computational biological studies including evaluation of network content, hypothesis testing and generation, analysis of phenotypic characteristics, and metabolic engineering. To date, genome-scale metabolic reconstructions for more than 30 organisms have been published and this number is expected to increase rapidly. However, these reconstructions differ in quality and coverage that may minimize their predictive potential and use as knowledge-bases. Here, we present a comprehensive protocol describing each step necessary to build a high-quality genome-scale metabolic reconstruction as well as common trials and tribulations. Therefore, this protocol provides a helpful manual for all stages of the reconstruction process. PMID:20057383
The UK’s 100,000 Genomes Project: manifesting policymakers’ expectations

PubMed Central

Samuel, Gabrielle Natalie; Farsides, Bobbie

2017-01-01

The UK’s 100,000 Genomes Project has the aim of sequencing 100,000 genomes from UK National Health Service (NHS) patients while concomitantly transforming clinical care such that whole genome sequencing becomes routine clinical practice in the UK. Policymakers claim that the project will revolutionize NHS care. We wished to explore the 100,000 Genomes Project, and in particular, the extent to which policymaker claims have helped or hindered the work of those associated with Genomics England – the company established by the Department of Health to deliver the project. We interviewed 20 individuals linked to, or working for Genomics England. Interviewees had double-edged views about the context within which they were working. On the one hand, policymakers’ expectations attached to the venture were considered vacuous “genohype”; on the other hand, they were considered the impetus needed for those trying to advance genomic research into clinical practice. Findings should be considered for future genomes projects. PMID:29238265
Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

2010-07-06

Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less
Accounting for discovery bias in genomic EPD

USDA-ARS?s Scientific Manuscript database

Genomics has contributed substantially to genetic improvement of beef cattle. The implementation is through computation of genomically enhanced expected progeny differences (GE-EPD), which are predictions of genetic merit of individual animals based on genomic information, pedigree, and data on the ...
Discrimination of candidate subgenome-specific loci by linkage map construction with an S1 population of octoploid strawberry (Fragaria × ananassa).

PubMed

Nagano, Soichiro; Shirasawa, Kenta; Hirakawa, Hideki; Maeda, Fumi; Ishikawa, Masami; Isobe, Sachiko N

2017-05-12

The strawberry, Fragaria × ananassa, is an allo-octoploid (2n = 8x = 56) and outcrossing species. Although it is the most widely consumed berry crop in the world, its complex genome structure has hindered its genetic and genomic analysis, and thus discrimination of subgenome-specific loci among the homoeologous chromosomes is needed. In the present study, we identified candidate subgenome-specific single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) loci, and constructed a linkage map using an S 1 mapping population of the cultivar 'Reikou' with an IStraw90 Axiom® SNP array and previously published SSR markers. The 'Reikou' linkage map consisted of 11,574 loci (11,002 SNPs and 572 SSR loci) spanning 2816.5 cM of 31 linkage groups. The 11,574 loci were located on 4738 unique positions (bin) on the linkage map. Of the mapped loci, 8999 (8588 SNPs and 411 SSR loci) showed a 1:2:1 segregation ratio of AA:AB:BB allele, which suggested the possibility of deriving loci from candidate subgenome-specific sequences. In addition, 2575 loci (2414 SNPs and 161 SSR loci) showed a 3:1 segregation of AB:BB allele, indicating they were derived from homoeologous genomic sequences. Comparative analysis of the homoeologous linkage groups revealed differences in genome structure among the subgenomes. Our results suggest that candidate subgenome-specific loci are randomly located across the genomes, and that there are small- to large-scale structural variations among the subgenomes. The mapped SNPs and SSR loci on the linkage map are expected to be seed points for the construction of pseudomolecules in the octoploid strawberry.
Genetic variation maintained in multilocus models of additive quantitative traits under stabilizing selection.

PubMed Central

Bürger, R; Gimelfarb, A

1999-01-01

Stabilizing selection for an intermediate optimum is generally considered to deplete genetic variation in quantitative traits. However, conflicting results from various types of models have been obtained. While classical analyses assuming a large number of independent additive loci with individually small effects indicated that no genetic variation is preserved under stabilizing selection, several analyses of two-locus models showed the contrary. We perform a complete analysis of a generalization of Wright's two-locus quadratic-optimum model and investigate numerically the ability of quadratic stabilizing selection to maintain genetic variation in additive quantitative traits controlled by up to five loci. A statistical approach is employed by choosing randomly 4000 parameter sets (allelic effects, recombination rates, and strength of selection) for a given number of loci. For each parameter set we iterate the recursion equations that describe the dynamics of gamete frequencies starting from 20 randomly chosen initial conditions until an equilibrium is reached, record the quantities of interest, and calculate their corresponding mean values. As the number of loci increases from two to five, the fraction of the genome expected to be polymorphic declines surprisingly rapidly, and the loci that are polymorphic increasingly are those with small effects on the trait. As a result, the genetic variance expected to be maintained under stabilizing selection decreases very rapidly with increased number of loci. The equilibrium structure expected under stabilizing selection on an additive trait differs markedly from that expected under selection with no constraints on genotypic fitness values. The expected genetic variance, the expected polymorphic fraction of the genome, as well as other quantities of interest, are only weakly dependent on the selection intensity and the level of recombination. PMID:10353920
Evaluating phylogenetic congruence in the post-genomic era.

PubMed

Leigh, Jessica W; Lapointe, François-Joseph; Lopez, Philippe; Bapteste, Eric

2011-01-01

Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures.
Evaluating Phylogenetic Congruence in the Post-Genomic Era

PubMed Central

Leigh, Jessica W.; Lapointe, François-Joseph; Lopez, Philippe; Bapteste, Eric

2011-01-01

Congruence is a broadly applied notion in evolutionary biology used to justify multigene phylogeny or phylogenomics, as well as in studies of coevolution, lateral gene transfer, and as evidence for common descent. Existing methods for identifying incongruence or heterogeneity using character data were designed for data sets that are both small and expected to be rarely incongruent. At the same time, methods that assess incongruence using comparison of trees test a null hypothesis of uncorrelated tree structures, which may be inappropriate for phylogenomic studies. As such, they are ill-suited for the growing number of available genome sequences, most of which are from prokaryotes and viruses, either for phylogenomic analysis or for studies of the evolutionary forces and events that have shaped these genomes. Specifically, many existing methods scale poorly with large numbers of genes, cannot accommodate high levels of incongruence, and do not adequately model patterns of missing taxa for different markers. We propose the development of novel incongruence assessment methods suitable for the analysis of the molecular evolution of the vast majority of life and support the investigation of homogeneity of evolutionary process in cases where markers do not share identical tree structures. PMID:21712432
Environmental Adaptation Contributes to Gene Polymorphism across the Arabidopsis thaliana Genome

PubMed Central

Lee, Cheng-Ruei

2012-01-01

The level of within-species polymorphism differs greatly among genes in a genome. Many genomic studies have investigated the relationship between gene polymorphism and factors such as recombination rate or expression pattern. However, the polymorphism of a gene is affected not only by its physical properties or functional constraints but also by natural selection on organisms in their environments. Specifically, if functionally divergent alleles enable adaptation to different environments, locus-specific polymorphism may be maintained by spatially heterogeneous natural selection. To test this hypothesis and estimate the extent to which environmental selection shapes the pattern of genome-wide polymorphism, we define the "environmental relevance" of a gene as the proportion of genetic variation explained by environmental factors, after controlling for population structure. We found substantial effects of environmental relevance on patterns of polymorphism among genes. In addition, the correlation between environmental relevance and gene polymorphism is positive, consistent with the expectation that balancing selection among heterogeneous environments maintains genetic variation at ecologically important genes. Comparison of the gene ontology annotations shows that genes with high environmental relevance are enriched in unknown function categories. These results suggest an important role for environmental factors in shaping genome-wide patterns of polymorphism and indicate another direction of genomic study. PMID:22798389
The complete chloroplast genome of salt cress (Eutrema salsugineum).

PubMed

Guo, Xinyi; Hao, Guoqian; Ma, Tao

2016-07-01

The complete chloroplast (cp) sequence of the salt cress (Eutrema salsugineum), a plant well-adapted to salt stress, was presented in this study. The circular molecule is 153,407 bp in length and exhibit a typical quadripartite structure containing an 83,894 bp large single copy (LSC) region, a 17,607 bp small single copy (SSC) region, and the two 25,953 bp inverted repeats (IRs). The salt cress cp genome contains 135 known genes, including 87 protein-coding genes, 8 ribosomal RNA genes, and 40 tRNA genes; 21 of these are located in the inverted repeat region. As expected, phylogenetic analysis support the idea that E. salsugineum is sister to Brassiceae species within the Brassicaceae family.
Patterns of admixture and population structure in native populations of Northwest North America.

PubMed

Verdu, Paul; Pemberton, Trevor J; Laurent, Romain; Kemp, Brian M; Gonzalez-Oliver, Angelica; Gorodezky, Clara; Hughes, Cris E; Shattuck, Milena R; Petzelt, Barbara; Mitchell, Joycelynn; Harry, Harold; William, Theresa; Worl, Rosita; Cybulski, Jerome S; Rosenberg, Noah A; Malhi, Ripan S

2014-08-01

The initial contact of European populations with indigenous populations of the Americas produced diverse admixture processes across North, Central, and South America. Recent studies have examined the genetic structure of indigenous populations of Latin America and the Caribbean and their admixed descendants, reporting on the genomic impact of the history of admixture with colonizing populations of European and African ancestry. However, relatively little genomic research has been conducted on admixture in indigenous North American populations. In this study, we analyze genomic data at 475,109 single-nucleotide polymorphisms sampled in indigenous peoples of the Pacific Northwest in British Columbia and Southeast Alaska, populations with a well-documented history of contact with European and Asian traders, fishermen, and contract laborers. We find that the indigenous populations of the Pacific Northwest have higher gene diversity than Latin American indigenous populations. Among the Pacific Northwest populations, interior groups provide more evidence for East Asian admixture, whereas coastal groups have higher levels of European admixture. In contrast with many Latin American indigenous populations, the variance of admixture is high in each of the Pacific Northwest indigenous populations, as expected for recent and ongoing admixture processes. The results reveal some similarities but notable differences between admixture patterns in the Pacific Northwest and those in Latin America, contributing to a more detailed understanding of the genomic consequences of European colonization events throughout the Americas.
Patterns of Admixture and Population Structure in Native Populations of Northwest North America

PubMed Central

Verdu, Paul; Pemberton, Trevor J.; Laurent, Romain; Kemp, Brian M.; Gonzalez-Oliver, Angelica; Gorodezky, Clara; Hughes, Cris E.; Shattuck, Milena R.; Petzelt, Barbara; Mitchell, Joycelynn; Harry, Harold; William, Theresa; Worl, Rosita; Cybulski, Jerome S.; Rosenberg, Noah A.; Malhi, Ripan S.

2014-01-01

The initial contact of European populations with indigenous populations of the Americas produced diverse admixture processes across North, Central, and South America. Recent studies have examined the genetic structure of indigenous populations of Latin America and the Caribbean and their admixed descendants, reporting on the genomic impact of the history of admixture with colonizing populations of European and African ancestry. However, relatively little genomic research has been conducted on admixture in indigenous North American populations. In this study, we analyze genomic data at 475,109 single-nucleotide polymorphisms sampled in indigenous peoples of the Pacific Northwest in British Columbia and Southeast Alaska, populations with a well-documented history of contact with European and Asian traders, fishermen, and contract laborers. We find that the indigenous populations of the Pacific Northwest have higher gene diversity than Latin American indigenous populations. Among the Pacific Northwest populations, interior groups provide more evidence for East Asian admixture, whereas coastal groups have higher levels of European admixture. In contrast with many Latin American indigenous populations, the variance of admixture is high in each of the Pacific Northwest indigenous populations, as expected for recent and ongoing admixture processes. The results reveal some similarities but notable differences between admixture patterns in the Pacific Northwest and those in Latin America, contributing to a more detailed understanding of the genomic consequences of European colonization events throughout the Americas. PMID:25122539
The membrane skeleton in Paramecium: Molecular characterization of a novel epiplasmin family and preliminary GFP expression results.

PubMed

Pomel, Sébastien; Diogon, Marie; Bouchard, Philippe; Pradel, Lydie; Ravet, Viviane; Coffe, Gérard; Viguès, Bernard

2006-02-01

Previous attempts to identify the membrane skeleton of Paramecium cells have revealed a protein pattern that is both complex and specific. The most prominent structural elements, epiplasmic scales, are centered around ciliary units and are closely apposed to the cytoplasmic side of the inner alveolar membrane. We sought to characterize epiplasmic scale proteins (epiplasmins) at the molecular level. PCR approaches enabled the cloning and sequencing of two closely related genes by amplifications of sequences from a macronuclear genomic library. Using these two genes (EPI-1 and EPI-2), we have contributed to the annotation of the Paramecium tetraurelia macronuclear genome and identified 39 additional (paralogous) sequences. Two orthologous sequences were found in the Tetrahymena thermophila genome. Structural analysis of the 43 sequences indicates that the hallmark of this new multigenic family is a 79 aa domain flanked by two Q-, P- and V-rich stretches of sequence that are much more variable in amino-acid composition. Such features clearly distinguish members of the multigenic family from epiplasmic proteins previously sequenced in other ciliates. The expression of Green Fluorescent Protein (GFP)-tagged epiplasmin showed significant labeling of epiplasmic scales as well as oral structures. We expect that the GFP construct described herein will prove to be a useful tool for comparative subcellular localization of different putative epiplasmins in Paramecium.
Nutrigenomics and the stewardship of scientific promises.

PubMed

Penders, Bart; Goven, Joanna

2010-09-01

Here we analyze the rise and establishment of nutrigenomics versus nutrition science from a political perspective. We argue that the exceptionalist status of nutrigenomics has been brought about by a carefully orchestrated economy of expectation, enabling the nutrigenomics community to develop its own research agenda that differs significantly from that of nutrition science. Nutrigenomics promotes research specifically directed towards the heterogeneity of dietary guidelines, while nutrition science pursues a public health goal dominated by homogeneous health messages. Through the development of genomic technology and the protective niche created by large global funding initiatives, this heterogeneity-research agenda has been able to develop itself. Those pursuing and supporting it have, through nutrigenomics' economy of expectation, influenced public opinion, and regulatory and political structures dealing with food and health. With many big global nutrigenomics initiatives slowly approaching their end, this article hints at some of the possible political consequences of its economy of expectation and suggests that a "stewardship" of promises and expectations is in order
Molecular Diversity and Population Structure of a Worldwide Collection of Cultivated Tetraploid Alfalfa (Medicago sativa subsp. sativa L.) Germplasm as Revealed by Microsatellite Markers.

PubMed

Qiang, Haiping; Chen, Zhihong; Zhang, Zhengli; Wang, Xuemin; Gao, Hongwen; Wang, Zan

2015-01-01

Information on genetic diversity and population structure of a tetraploid alfalfa collection might be valuable in effective use of the genetic resources. A set of 336 worldwide genotypes of tetraploid alfalfa (Medicago sativa subsp. sativa L.) was genotyped using 85 genome-wide distributed SSR markers to reveal the genetic diversity and population structure in the alfalfa. Genetic diversity analysis identified a total of 1056 alleles across 85 marker loci. The average expected heterozygosity and polymorphism information content values were 0.677 and 0.638, respectively, showing high levels of genetic diversity in the cultivated tetraploid alfalfa germplasm. Comparison of genetic characteristics across chromosomes indicated regions of chromosomes 2 and 3 had the highest genetic diversity. A higher genetic diversity was detected in alfalfa landraces than that of wild materials and cultivars. Two populations were identified by the model-based population structure, principal coordinate and neighbor-joining analyses, corresponding to China and other parts of the world. However, lack of strictly correlation between clustering and geographic origins suggested extensive germplasm exchanges of alfalfa germplasm across diverse geographic regions. The quantitative analysis of the genetic diversity and population structure in this study could be useful for genetic and genomic analysis and utilization of the genetic variation in alfalfa breeding.
Reducing assembly complexity of microbial genomes with single-molecule sequencing.

PubMed

Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M

2013-01-01

The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Sensitivity to sequencing depth in single-cell cancer genomics.

PubMed

Alves, João M; Posada, David

2018-04-16

Querying cancer genomes at single-cell resolution is expected to provide a powerful framework to understand in detail the dynamics of cancer evolution. However, given the high costs currently associated with single-cell sequencing, together with the inevitable technical noise arising from single-cell genome amplification, cost-effective strategies that maximize the quality of single-cell data are critically needed. Taking advantage of previously published single-cell whole-genome and whole-exome cancer datasets, we studied the impact of sequencing depth and sampling effort towards single-cell variant detection. Five single-cell whole-genome and whole-exome cancer datasets were independently downscaled to 25, 10, 5, and 1× sequencing depth. For each depth level, ten technical replicates were generated, resulting in a total of 6280 single-cell BAM files. The sensitivity of variant detection, including structural and driver mutations, genotyping, clonal inference, and phylogenetic reconstruction to sequencing depth was evaluated using recent tools specifically designed for single-cell data. Altogether, our results suggest that for relatively large sample sizes (25 or more cells) sequencing single tumor cells at depths > 5× does not drastically improve somatic variant discovery, characterization of clonal genotypes, or estimation of single-cell phylogenies. We suggest that sequencing multiple individual tumor cells at a modest depth represents an effective alternative to explore the mutational landscape and clonal evolutionary patterns of cancer genomes.
Comparative Analysis of Repetitive DNA between the Main Vectors of Chagas Disease: Triatoma infestans and Rhodnius prolixus.

PubMed

Pita, Sebastián; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Sánchez, Antonio; Panzera, Francisco; Lorite, Pedro

2018-04-24

Chagas disease or American trypanosomiasis affects six to seven million people worldwide, mostly in Latin America. This disease is transmitted by hematophagous insects known as "kissing bugs" (Hemiptera, Triatominae), with Triatoma infestans and Rhodnius prolixus being the two most important vector species. Despite the fact that both species present the same diploid chromosome number (2 n = 22), they have remarkable differences in their total DNA content, chromosome structure and genome organization. Variations in the DNA genome size are expected to be due to differences in the amount of repetitive DNA sequences. The T. infestans genome-wide analysis revealed the existence of 42 satellite DNA families. BLAST searches of these sequences against the R. prolixus genome assembly revealed that only four of these satellite DNA families are shared between both species, suggesting a great differentiation between the Triatoma and Rhodnius genomes. Fluorescence in situ hybridization (FISH) location of these repetitive DNAs in both species showed that they are dispersed on the euchromatic regions of all autosomes and the X chromosome. Regarding the Y chromosome, these common satellite DNAs are absent in T. infestans but they are present in the R. prolixus Y chromosome. These results support a different origin and/or evolution in the Y chromosome of both species.

Persistency of accuracy of genomic breeding values for different simulated pig breeding programs in developing countries.

PubMed

Akanno, E C; Schenkel, F S; Sargolzaei, M; Friendship, R M; Robinson, J A B

2014-10-01

Genetic improvement of pigs in tropical developing countries has focused on imported exotic populations which have been subjected to intensive selection with attendant high population-wide linkage disequilibrium (LD). Presently, indigenous pig population with limited selection and low LD are being considered for improvement. Given that the infrastructure for genetic improvement using the conventional BLUP selection methods are lacking, a genome-wide selection (GS) program was proposed for developing countries. A simulation study was conducted to evaluate the option of using 60 K SNP panel and observed amount of LD in the exotic and indigenous pig populations. Several scenarios were evaluated including different size and structure of training and validation populations, different selection methods and long-term accuracy of GS in different population/breeding structures and traits. The training set included previously selected exotic population, unselected indigenous population and their crossbreds. Traits studied included number born alive (NBA), average daily gain (ADG) and back fat thickness (BFT). The ridge regression method was used to train the prediction model. The results showed that accuracies of genomic breeding values (GBVs) in the range of 0.30 (NBA) to 0.86 (BFT) in the validation population are expected if high density marker panels are utilized. The GS method improved accuracy of breeding values better than pedigree-based approach for traits with low heritability and in young animals with no performance data. Crossbred training population performed better than purebreds when validation was in populations with similar or a different structure as in the training set. Genome-wide selection holds promise for genetic improvement of pigs in the tropics. © 2014 Blackwell Verlag GmbH.
Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

DOE PAGES

Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

2015-03-12

The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
Confluence of genes, environment, development, and behavior in a post Genome-Wide Association Study world.

PubMed

Vrieze, Scott I; Iacono, William G; McGue, Matt

2012-11-01

This article serves to outline a research paradigm to investigate main effects and interactions of genes, environment, and development on behavior and psychiatric illness. We provide a historical context for candidate gene studies and genome-wide association studies, including benefits, limitations, and expected payoffs. Using substance use and abuse as our driving example, we then turn to the importance of etiological psychological theory in guiding genetic, environmental, and developmental research, as well as the utility of refined phenotypic measures, such as endophenotypes, in the pursuit of etiological understanding and focused tests of genetic and environmental associations. Phenotypic measurement has received considerable attention in the history of psychology and is informed by psychometrics, whereas the environment remains relatively poorly measured and is often confounded with genetic effects (i.e., gene-environment correlation). Genetically informed designs, which are no longer limited to twin and adoption studies thanks to ever-cheaper genotyping, are required to understand environmental influences. Finally, we outline the vast amount of individual difference in structural genomic variation, most of which remains to be leveraged in genetic association tests. Although the genetic data can be massive and burdensome (tens of millions of variants per person), we argue that improved understanding of genomic structure and function will provide investigators with new tools to test specific a priori hypotheses derived from etiological psychological theory, much like current candidate gene research but with less confusion and more payoff than candidate gene research has to date.
On the Power and the Systematic Biases of the Detection of Chromosomal Inversions by Paired-End Genome Sequencing

PubMed Central

Lucas Lledó, José Ignacio; Cáceres, Mario

2013-01-01

One of the most used techniques to study structural variation at a genome level is paired-end mapping (PEM). PEM has the advantage of being able to detect balanced events, such as inversions and translocations. However, inversions are still quite difficult to predict reliably, especially from high-throughput sequencing data. We simulated realistic PEM experiments with different combinations of read and library fragment lengths, including sequencing errors and meaningful base-qualities, to quantify and track down the origin of false positives and negatives along sequencing, mapping, and downstream analysis. We show that PEM is very appropriate to detect a wide range of inversions, even with low coverage data. However, % of inversions located between segmental duplications are expected to go undetected by the most common sequencing strategies. In general, longer DNA libraries improve the detectability of inversions far better than increments of the coverage depth or the read length. Finally, we review the performance of three algorithms to detect inversions —SVDetect, GRIAL, and VariationHunter—, identify common pitfalls, and reveal important differences in their breakpoint precisions. These results stress the importance of the sequencing strategy for the detection of structural variants, especially inversions, and offer guidelines for the design of future genome sequencing projects. PMID:23637806
Recognition of the murine coronavirus genomic RNA packaging signal depends on the second RNA-binding domain of the nucleocapsid protein.

PubMed

Kuo, Lili; Koetzner, Cheri A; Hurst, Kelley R; Masters, Paul S

2014-04-01

The coronavirus nucleocapsid (N) protein forms a helical ribonucleoprotein with the viral positive-strand RNA genome and binds to the principal constituent of the virion envelope, the membrane (M) protein, to facilitate assembly and budding. Besides these structural roles, N protein associates with a component of the replicase-transcriptase complex, nonstructural protein 3, at a critical early stage of infection. N protein has also been proposed to participate in the replication and selective packaging of genomic RNA and the transcription and translation of subgenomic mRNA. Coronavirus N proteins contain two structurally distinct RNA-binding domains, an unusual characteristic among RNA viruses. To probe the functions of these domains in the N protein of the model coronavirus mouse hepatitis virus (MHV), we constructed mutants in which each RNA-binding domain was replaced by its counterpart from the N protein of severe acute respiratory syndrome coronavirus (SARS-CoV). Mapping of revertants of the resulting chimeric viruses provided evidence for extensive intramolecular interactions between the two RNA-binding domains. Through analysis of viral RNA that was packaged into virions we identified the second of the two RNA-binding domains as a principal determinant of MHV packaging signal recognition. As expected, the interaction of N protein with M protein was not affected in either of the chimeric viruses. Moreover, the SARS-CoV N substitutions did not alter the fidelity of leader-body junction formation during subgenomic mRNA synthesis. These results more clearly delineate the functions of N protein and establish a basis for further exploration of the mechanism of genomic RNA packaging. This work describes the interactions of the two RNA-binding domains of the nucleocapsid protein of a model coronavirus, mouse hepatitis virus. The main finding is that the second of the two domains plays an essential role in recognizing the RNA structure that allows the selective packaging of genomic RNA into assembled virions.
Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

PubMed

Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2014-05-01

South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.
Biochemical studies of the Saccharomyces cerevisiae Mph1 helicase on junction-containing DNA structures

PubMed Central

Kang, Young-Hoon; Munashingha, Palinda Ruvan; Lee, Chul-Hwan; Nguyen, Tuan Anh; Seo, Yeon-Soo

2012-01-01

Saccharomyces cerevisiae Mph1 is a 3–5′ DNA helicase, required for the maintenance of genome integrity. In order to understand the ATPase/helicase role of Mph1 in genome stability, we characterized its helicase activity with a variety of DNA substrates, focusing on its action on junction structures containing three or four DNA strands. Consistent with its 3′ to 5′ directionality, Mph1 displaced 3′-flap substrates in double-fixed or equilibrating flap substrates. Surprisingly, Mph1 displaced the 5′-flap strand more efficiently than the 3′ flap strand from double-flap substrates, which is not expected for a 3–5′ DNA helicase. For this to occur, Mph1 required a threshold size (>5 nt) of 5′ single-stranded DNA flap. Based on the unique substrate requirements of Mph1 defined in this study, we propose that the helicase/ATPase activity of Mph1 play roles in converting multiple-stranded DNA structures into structures cleavable by processing enzymes such as Fen1. We also found that the helicase activity of Mph1 was used to cause structural alterations required for restoration of replication forks stalled due to damaged template. The helicase properties of Mph1 reported here could explain how it resolves D-loop structure, and are in keeping with a model proposed for the error-free damage avoidance pathway. PMID:22090425
Novel origins of copy number variation in the dog genome

PubMed Central

2012-01-01

Background Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves. Results We use a stringent new method to identify a total of 430 high-confidence CNV loci, which range in size from 9 kb to 1.6 Mb and span 26.4 Mb, or 1.08%, of the assayed dog genome, overlapping 413 annotated genes. Of CNVs observed in each breed, 98% are also observed in multiple breeds. CNVs predicted to disrupt gene function are significantly less common than expected by chance. We identify a significant overrepresentation of peaks of GC content, previously shown to be enriched in dog recombination hotspots, in the vicinity of CNV breakpoints. Conclusions A number of the CNVs identified by this study are candidates for generating breed-specific phenotypes. Purifying selection seems to be a major factor shaping structural variation in the dog genome, suggesting that many CNVs are deleterious. Localized peaks of GC content appear to be novel sites of CNV formation in the dog genome by non-allelic homologous recombination, potentially activated by the loss of PRDM9. These sequence features may have driven genome instability and chromosomal rearrangements throughout canid evolution. PMID:22916802
Detection of somatic, subclonal and mosaic CNVs from sequencing | Division of Cancer Prevention

Cancer.gov

Progress in technology has made individual genome sequencing a clinical reality, with partial genome sequencing already in use in clinical care. In fact, it is expected that within a few years whole genome sequencing will be a standard procedure that will allow discovering personal genomic variants of all types and thus greatly facilitate individualized medicine. However, fast
The molecular diversity of α-gliadin genes in the tribe Triticeae.

PubMed

Qi, Peng-Fei; Chen, Qing; Ouellet, Thérèse; Wang, Zhao; Le, Cheng-Xing; Wei, Yu-Ming; Lan, Xiu-Jin; Zheng, You-Liang

2013-09-01

Many of the unique properties of wheat flour are derived from seed storage proteins such as the α-gliadins. In this study these α-gliadin genes from diploid Triticeae species were systemically characterized, and divided into 3 classes according to the distinct organization of their protein domains. Our analyses indicated that these α-gliadins varied in the number of cysteine residues they contained. Most of the α-gliadin genes were grouped according to their genomic origins within the phylogenetic tree. As expected, sequence alignments suggested that the repetitive domain and the two polyglutamine regions were responsible for length variations of α-gliadins as were the insertion/deletion of structural domains within the three different classes (I, II, and III) of α-gliadins. A screening of celiac disease toxic epitopes indicated that the α-gliadins of the class II, derived from the Ns genome, contain no epitope, and that some other genomes contain much fewer epitopes than the A, S(B) and D genomes of wheat. Our results suggest that the observed genetic differences in α-gliadins of Triticeae might indicate their use as a fertile ground for the breeding of less CD-toxic wheat varieties.
Evolution of Functional Diversification within Quasispecies

PubMed Central

Colizzi, Enrico Sandro; Hogeweg, Paulien

2014-01-01

According to quasispecies theory, high mutation rates limit the amount of information genomes can store (Eigen’s Paradox), whereas genomes with higher degrees of neutrality may be selected even at the expenses of higher replication rates (the “survival of the flattest” effect). Introducing a complex genotype to phenotype map, such as RNA folding, epitomizes such effect because of the existence of neutral networks and their exploitation by evolution, affecting both population structure and genome composition. We reexamine these classical results in the light of an RNA-based system that can evolve its own ecology. Contrary to expectations, we find that quasispecies evolving at high mutation rates are steep and characterized by one master sequence. Importantly, the analysis of the system and the characterization of the evolved quasispecies reveal the emergence of functionalities as phenotypes of nonreplicating genotypes, whose presence is crucial for the overall viability and stability of the system. In other words, the master sequence codes for the information of the entire ecosystem, whereas the decoding happens, stochastically, through mutations. We show that this solution quickly outcompetes strategies based on genomes with a high degree of neutrality. In conclusion, individually coded but ecosystem-based diversity evolves and persists indefinitely close to the Information Threshold. PMID:25056399
Understanding regulatory networks requires more than computing a multitude of graph statistics. Comment on "Drivers of structural features in gene regulatory networks: From biophysical constraints to biological function" by O.C. Martin et al.

NASA Astrophysics Data System (ADS)

Tkačik, Gašper

2016-07-01

The article by O. Martin and colleagues provides a much needed systematic review of a body of work that relates the topological structure of genetic regulatory networks to evolutionary selection for function. This connection is very important. Using the current wealth of genomic data, statistical features of regulatory networks (e.g., degree distributions, motif composition, etc.) can be quantified rather easily; it is, however, often unclear how to interpret the results. On a graph theoretic level the statistical significance of the results can be evaluated by comparing observed graphs to ;randomized; ones (bravely ignoring the issue of how precisely to randomize!) and comparing the frequency of appearance of a particular network structure relative to a randomized null expectation. While this is a convenient operational test for statistical significance, its biological meaning is questionable. In contrast, an in-silico genotype-to-phenotype model makes explicit the assumptions about the network function, and thus clearly defines the expected network structures that can be compared to the case of no selection for function and, ultimately, to data.
Curated eutherian third party data gene data sets.

PubMed

Premzl, Marko

2016-03-01

The free available eutherian genomic sequence data sets advanced scientific field of genomics. Of note, future revisions of gene data sets were expected, due to incompleteness of public eutherian genomic sequence assemblies and potential genomic sequence errors. The eutherian comparative genomic analysis protocol was proposed as guidance in protection against potential genomic sequence errors in public eutherian genomic sequences. The protocol was applicable in updates of 7 major eutherian gene data sets, including 812 complete coding sequences deposited in European Nucleotide Archive as curated third party data gene data sets.
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

PubMed

DeMaere, Matthew Z; Darling, Aaron E

2018-02-01

Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Perspectives for genomic selection applications and research in plants

USDA-ARS?s Scientific Manuscript database

Genomic selection (GS) has created a lot of excitement and expectations in the animal and plant breeding research communities. In this review, we briefly describe how genomic prediction can be integrated into breeding efforts and point out achievements and areas where more research is needed. GS pro...
Genome editing: progress and challenges for medical applications.

PubMed

Carroll, Dana

2016-11-15

The development of the CRISPR-Cas platform for genome editing has greatly simplified the process of making targeted genetic modifications. Applications of genome editing are expected to have a substantial impact on human therapies through the development of better animal models, new target discovery, and direct therapeutic intervention.
Accounting for discovery bias in genomic prediction

USDA-ARS?s Scientific Manuscript database

Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...
Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves.

PubMed

Hedrick, Philip W; Kardos, Marty; Peterson, Rolf O; Vucetich, John A

2017-03-01

Inbreeding, relatedness, and ancestry have traditionally been estimated with pedigree information, however, molecular genomic data can provide more detailed examination of these properties. For example, pedigree information provides estimation of the expected value of these measures but molecular genomic data can estimate the realized values of these measures in individuals. Here, we generate the theoretical distribution of inbreeding, relatedness, and ancestry for the individuals in the pedigree of the Isle Royale wolves, the first examination of such variation in a wild population with a known pedigree. We use the 38 autosomes of the dog genome and their estimated map lengths in our genomic analysis. Although it is known that the remaining wolves are highly inbred, closely related, and descend from only 3 ancestors, our analyses suggest that there is significant variation in the realized inbreeding and relatedness around pedigree expectations. For example, the expected inbreeding in a hypothetical offspring from the 2 remaining wolves is 0.438 but the realized 95% genomic confidence interval is from 0.311 to 0.565. For individual chromosomes, a substantial proportion of the whole chromosomes are completely identical by descent. This examination provides a background to use when analyzing molecular genomic data for individual levels of inbreeding, relatedness, and ancestry. The level of variation in these measures is a function of the time to the common ancestor(s), the number of chromosomes, and the rate of recombination. In the Isle Royale wolf population, the few generations to a common ancestor results in the high variance in genomic inbreeding. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Assessment of Genetic Diversity and Structure of Large Garlic (Allium sativum) Germplasm Bank, by Diversity Arrays Technology “Genotyping-by-Sequencing” Platform (DArTseq)

PubMed Central

Egea, Leticia A.; Mérida-García, Rosa; Kilian, Andrzej; Hernandez, Pilar; Dorado, Gabriel

2017-01-01

Garlic (Allium sativum) is used worldwide in cooking and industry, including pharmacology/medicine and cosmetics, for its interesting properties. Identifying redundancies in germplasm blanks to generate core collections is a major concern, mostly in large stocks, in order to reduce space and maintenance costs. Yet, similar appearance and phenotypic plasticity of garlic varieties hinder their morphological classification. Molecular studies are challenging, due to the large and expected complex genome of this species, with asexual reproduction. Classical molecular markers, like isozymes, RAPD, SSR, or AFLP, are not convenient to generate germplasm core-collections for this species. The recent emergence of high-throughput genotyping-by-sequencing (GBS) approaches, like DArTseq, allow to overcome such limitations to characterize and protect genetic diversity. Therefore, such technology was used in this work to: (i) assess genetic diversity and structure of a large garlic-germplasm bank (417 accessions); (ii) create a core collection; (iii) relate genotype to agronomical features; and (iv) describe a cost-effective method to manage genetic diversity in garlic-germplasm banks. Hierarchical-cluster analysis, principal-coordinates analysis and STRUCTURE showed general consistency, generating three main garlic-groups, mostly determined by variety and geographical origin. In addition, high-resolution genotyping identified 286 unique and 131 redundant accessions, used to select a reduced size germplasm-bank core collection. This demonstrates that DArTseq is a cost-effective method to analyze species with large and expected complex genomes, like garlic. To the best of our knowledge, this is the first report of high-throughput genotyping of a large garlic germplasm. This is particularly interesting for garlic adaptation and improvement, to fight biotic and abiotic stresses, in the current context of climate change and global warming. PMID:28775737
Assessment of Genetic Diversity and Structure of Large Garlic (Allium sativum) Germplasm Bank, by Diversity Arrays Technology "Genotyping-by-Sequencing" Platform (DArTseq).

PubMed

Egea, Leticia A; Mérida-García, Rosa; Kilian, Andrzej; Hernandez, Pilar; Dorado, Gabriel

2017-01-01

Garlic ( Allium sativum ) is used worldwide in cooking and industry, including pharmacology/medicine and cosmetics, for its interesting properties. Identifying redundancies in germplasm blanks to generate core collections is a major concern, mostly in large stocks, in order to reduce space and maintenance costs. Yet, similar appearance and phenotypic plasticity of garlic varieties hinder their morphological classification. Molecular studies are challenging, due to the large and expected complex genome of this species, with asexual reproduction. Classical molecular markers, like isozymes, RAPD, SSR, or AFLP, are not convenient to generate germplasm core-collections for this species. The recent emergence of high-throughput genotyping-by-sequencing (GBS) approaches, like DArTseq, allow to overcome such limitations to characterize and protect genetic diversity. Therefore, such technology was used in this work to: (i) assess genetic diversity and structure of a large garlic-germplasm bank (417 accessions); (ii) create a core collection; (iii) relate genotype to agronomical features; and (iv) describe a cost-effective method to manage genetic diversity in garlic-germplasm banks. Hierarchical-cluster analysis, principal-coordinates analysis and STRUCTURE showed general consistency, generating three main garlic-groups, mostly determined by variety and geographical origin. In addition, high-resolution genotyping identified 286 unique and 131 redundant accessions, used to select a reduced size germplasm-bank core collection. This demonstrates that DArTseq is a cost-effective method to analyze species with large and expected complex genomes, like garlic. To the best of our knowledge, this is the first report of high-throughput genotyping of a large garlic germplasm. This is particularly interesting for garlic adaptation and improvement, to fight biotic and abiotic stresses, in the current context of climate change and global warming.

Mutations that Cause Human Disease: A Computational/Experimental Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beernink, P; Barsky, D; Pesavento, B

International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
Using secondary structure to identify ribosomal numts: cautionary examples from the human genome.

PubMed

Olson, Link E; Yoder, Anne D

2002-01-01

The identification of inadvertently sequenced mitochondrial pseudogenes (numts) is critical to any study employing mitochondrial DNA sequence data. Failure to discriminate numts correctly can confound phylogenetic reconstruction and studies of molecular evolution. This is especially problematic for ribosomal mtDNA genes. Unlike protein-coding loci, whose pseudogenes tend to accumulate diagnostic frameshift or premature stop mutations, functional ribosomal genes are not constrained to maintain a reading frame and can accumulate insertion-deletion events of varying length, particularly in nonpairing regions. Several authors have advocated using structural features of the transcribed rRNA molecule to differentiate functional mitochondrial rRNA genes from their nuclear paralogs. We explored this approach using the mitochondrial 12S rRNA gene and three known 12S numts from the human genome in the context of anthropoid phylogeny and the inferred secondary structure of primate 12S rRNA. Contrary to expectation, each of the three human numts exhibits striking concordance with secondary structure models, with little, if any, indication of their pseudogene status, and would likely escape detection based on structural criteria alone. Furthermore, we show that the unwitting inclusion of a particularly ancient (18-25 Myr old) and surprisingly cryptic human numt in a phylogenetic analysis would yield a well-supported but dramatically incorrect conclusion regarding anthropoid relationships. Though we endorse the use of secondary structure models for inferring positional homology wholeheartedly, we caution against reliance on structural criteria for the discrimination of rRNA numts, given the potential fallibility of this approach.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

PubMed

Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

2017-04-15

Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.
Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

PubMed Central

Sinclair, Robert M.; Ravantti, Janne J.

2017-01-01

ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979
Molecular Diversity and Population Structure of a Worldwide Collection of Cultivated Tetraploid Alfalfa (Medicago sativa subsp. sativa L.) Germplasm as Revealed by Microsatellite Markers

PubMed Central

Qiang, Haiping; Chen, Zhihong; Zhang, Zhengli; Wang, Xuemin; Gao, Hongwen; Wang, Zan

2015-01-01

Information on genetic diversity and population structure of a tetraploid alfalfa collection might be valuable in effective use of the genetic resources. A set of 336 worldwide genotypes of tetraploid alfalfa (Medicago sativa subsp. sativa L.) was genotyped using 85 genome-wide distributed SSR markers to reveal the genetic diversity and population structure in the alfalfa. Genetic diversity analysis identified a total of 1056 alleles across 85 marker loci. The average expected heterozygosity and polymorphism information content values were 0.677 and 0.638, respectively, showing high levels of genetic diversity in the cultivated tetraploid alfalfa germplasm. Comparison of genetic characteristics across chromosomes indicated regions of chromosomes 2 and 3 had the highest genetic diversity. A higher genetic diversity was detected in alfalfa landraces than that of wild materials and cultivars. Two populations were identified by the model-based population structure, principal coordinate and neighbor-joining analyses, corresponding to China and other parts of the world. However, lack of strictly correlation between clustering and geographic origins suggested extensive germplasm exchanges of alfalfa germplasm across diverse geographic regions. The quantitative analysis of the genetic diversity and population structure in this study could be useful for genetic and genomic analysis and utilization of the genetic variation in alfalfa breeding. PMID:25901573
Functional and genomic analyses of alpha-solenoid proteins.

PubMed

Fournier, David; Palidwor, Gareth A; Shcherbinin, Sergey; Szengel, Angelika; Schaefer, Martin H; Perez-Iratxeta, Carol; Andrade-Navarro, Miguel A

2013-01-01

Alpha-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some alpha-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of alpha-helical repeats based on a neural network trained on a dataset of protein structures. Here we improved the detection algorithm and updated the training dataset using recently solved structures of alpha-solenoids. Unexpectedly, we identified occurrences of alpha-solenoids in solved protein structures that escaped attention, for example within the core of the catalytic subunit of PI3KC. Our results expand the current set of known alpha-solenoids. Application of our tool to the protein universe allowed us to detect their significant enrichment in proteins interacting with many proteins, confirming that alpha-solenoids are generally involved in protein-protein interactions. We then studied the taxonomic distribution of alpha-solenoids to discuss an evolutionary scenario for the emergence of this type of domain, speculating that alpha-solenoids have emerged in multiple taxa in independent events by convergent evolution. We observe a higher rate of alpha-solenoids in eukaryotic genomes and in some prokaryotic families, such as Cyanobacteria and Planctomycetes, which could be associated to increased cellular complexity. The method is available at http://cbdm.mdc-berlin.de/~ard2/.
Identification of genomic indels and structural variations using split reads

PubMed Central

2011-01-01

Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful. PMID:21787423
Understanding our Genetic Inheritance: The U.S. Human Genome Project, The First Five Years FY 1991--1995

DOE R&D Accomplishments Database

1990-04-01

The Human Genome Initiative is a worldwide research effort with the goal of analyzing the structure of human DNA and determining the location of the estimated 100,000 human genes. In parallel with this effort, the DNA of a set of model organisms will be studied to provide the comparative information necessary for understanding the functioning of the human genome. The information generated by the human genome project is expected to be the source book for biomedical science in the 21st century and will by of immense benefit to the field of medicine. It will help us to understand and eventually treat many of the more than 4000 genetic diseases that affect mankind, as well as the many multifactorial diseases in which genetic predisposition plays an important role. A centrally coordinated project focused on specific objectives is believed to be the most efficient and least expensive way of obtaining this information. The basic data produced will be collected in electronic databases that will make the information readily accessible on convenient form to all who need it. This report describes the plans for the U.S. human genome project and updates those originally prepared by the Office of Technology Assessment (OTA) and the National Research Council (NRC) in 1988. In the intervening two years, improvements in technology for almost every aspect of genomics research have taken place. As a result, more specific goals can now be set for the project.
Human inversions and their functional consequences

PubMed Central

Puig, Marta; Casillas, Sònia; Villatoro, Sergi

2015-01-01

Polymorphic inversions are a type of structural variants that are difficult to analyze owing to their balanced nature and the location of breakpoints within complex repeated regions. So far, only a handful of inversions have been studied in detail in humans and current knowledge about their possible functional effects is still limited. However, inversions have been related to phenotypic changes and adaptation in multiple species. In this review, we summarize the evidences of the functional impact of inversions in the human genome. First, given that inversions have been shown to inhibit recombination in heterokaryotes, chromosomes displaying different orientation are expected to evolve independently and this may lead to distinct gene-expression patterns. Second, inversions have a role as disease-causing mutations both by directly affecting gene structure or regulation in different ways, and by predisposing to other secondary arrangements in the offspring of inversion carriers. Finally, several inversions show signals of being selected during human evolution. These findings illustrate the potential of inversions to have phenotypic consequences also in humans and emphasize the importance of their inclusion in genome-wide association studies. PMID:25998059
Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

PubMed

Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

2017-05-22

Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.
Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum)

PubMed Central

Cho, Kwang-Soo; Yun, Bong-Kyoung; Yoon, Young-Ho; Hong, Su-Young; Mekapogu, Manjulatha; Kim, Kyung-Hee; Yang, Tae-Jin

2015-01-01

We report the chloroplast (cp) genome sequence of tartary buckwheat (Fagopyrum tataricum) obtained by next-generation sequencing technology and compared this with the previously reported common buckwheat (F. esculentum ssp. ancestrale) cp genome. The cp genome of F. tataricum has a total sequence length of 159,272 bp, which is 327 bp shorter than the common buckwheat cp genome. The cp gene content, order, and orientation are similar to those of common buckwheat, but with some structural variation at tandem and palindromic repeat frequencies and junction areas. A total of seven InDels (around 100 bp) were found within the intergenic sequences and the ycf1 gene. Copy number variation of the 21-bp tandem repeat varied in F. tataricum (four repeats) and F. esculentum (one repeat), and the InDel of the ycf1 gene was 63 bp long. Nucleotide and amino acid have highly conserved coding sequence with about 98% homology and four genes—rpoC2, ycf3, accD, and clpP—have high synonymous (Ks) value. PCR based InDel markers were applied to diverse genetic resources of F. tataricum and F. esculentum, and the amplicon size was identical to that expected in silico. Therefore, these InDel markers are informative biomarkers to practically distinguish raw or processed buckwheat products derived from F. tataricum and F. esculentum. PMID:25966355
Parents' Experience with Pediatric Microarray: Transferrable Lessons in the Era of Genomic Counseling.

PubMed

Hayeems, R Z; Babul-Hirji, R; Hoang, N; Weksberg, R; Shuman, C

2016-04-01

Advances in genome-based microarray and sequencing technologies hold tremendous promise for understanding, better-managing and/or preventing disease and disease-related risk. Chromosome microarray technology (array based comparative genomic hybridization [aCGH]) is widely utilized in pediatric care to inform diagnostic etiology and medical management. Less clear is how parents experience and perceive the value of this technology. This study explored parents' experiences with aCGH in the pediatric setting, focusing on how they make meaning of various types of test results. We conducted in-person or telephone-based semi-structured interviews with parents of 21 children who underwent aCGH testing in 2010. Transcripts were coded and analyzed thematically according to the principles of interpretive description. We learned that parents expect genomic tests to be of personal use; their experiences with aCGH results characterize this use as intrinsic in the test's ability to provide a much sought-after answer for their child's condition, and instrumental in its ability to guide care, access to services, and family planning. In addition, parents experience uncertainty regardless of whether aCGH results are of pathogenic, uncertain, or benign significance; this triggers frustration, fear, and hope. Findings reported herein better characterize the notion of personal utility and highlight the pervasive nature of uncertainty in the context of genomic testing. Empiric research that links pre-test counseling content and psychosocial outcomes is warranted to optimize patient care.
Exploration of the Drosophila buzzatii transposable element content suggests underestimation of repeats in Drosophila genomes.

PubMed

Rius, Nuria; Guillén, Yolanda; Delprat, Alejandra; Kapusta, Aurélie; Feschotte, Cédric; Ruiz, Alfredo

2016-05-10

Many new Drosophila genomes have been sequenced in recent years using new-generation sequencing platforms and assembly methods. Transposable elements (TEs), being repetitive sequences, are often misassembled, especially in the genomes sequenced with short reads. Consequently, the mobile fraction of many of the new genomes has not been analyzed in detail or compared with that of other genomes sequenced with different methods, which could shed light into the understanding of genome and TE evolution. Here we compare the TE content of three genomes: D. buzzatii st-1, j-19, and D. mojavensis. We have sequenced a new D. buzzatii genome (j-19) that complements the D. buzzatii reference genome (st-1) already published, and compared their TE contents with that of D. mojavensis. We found an underestimation of TE sequences in Drosophila genus NGS-genomes when compared to Sanger-genomes. To be able to compare genomes sequenced with different technologies, we developed a coverage-based method and applied it to the D. buzzatii st-1 and j-19 genome. Between 10.85 and 11.16 % of the D. buzzatii st-1 genome is made up of TEs, between 7 and 7,5 % of D. buzzatii j-19 genome, while TEs represent 15.35 % of the D. mojavensis genome. Helitrons are the most abundant order in the three genomes. TEs in D. buzzatii are less abundant than in D. mojavensis, as expected according to the genome size and TE content positive correlation. However, TEs alone do not explain the genome size difference. TEs accumulate in the dot chromosomes and proximal regions of D. buzzatii and D. mojavensis chromosomes. We also report a significantly higher TE density in D. buzzatii and D. mojavensis X chromosomes, which is not expected under the current models. Our easy-to-use correction method allowed us to identify recently active families in D. buzzatii st-1 belonging to the LTR-retrotransposon superfamily Gypsy.
Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

PubMed

Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

2016-01-01

Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. Copyright © 2016 by the Genetics Society of America.
Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

PubMed

Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

2015-10-05

Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome preservation, but rather suggest the additional influence of life-history traits on preservation of synteny. The regions of synteny that were detected provide an important tool for defining and cataloging genes in the QTL regions for advancing chestnut blight resistance research.
Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome

PubMed Central

Li, Jian; Harris, R. Alan; Cheung, Sau Wai; Coarfa, Cristian; Jeong, Mira; Goodell, Margaret A.; White, Lisa D.; Patel, Ankita; Kang, Sung-Hae; Shaw, Chad; Chinault, A. Craig; Gambin, Tomasz; Gambin, Anna; Lupski, James R.; Milosavljevic, Aleksandar

2012-01-01

The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease. PMID:22615578
Laser microtreatment for genetic manipulations and DNA diagnostics by a combination of microbeam and photonic tweezers (laser microbeam trap)

NASA Astrophysics Data System (ADS)

Greulich, Karl-Otto; Monajembashi, Shamci; Celeda, D.; Endlich, N.; Eickhoff, Holger; Hoyer, Carsten; Leitz, G.; Weber, Gerd; Scheef, J.; Rueterjans, H.

1994-12-01

Genomes of higher organisms are larger than one typically expects. For example, the DNA of a single human cell is almost two meters long, the DNA in the human body covers the distance Earth-Sun approximately 140 times. This is often not considered in typical molecular biological approaches for DNA diagnostics, where usually only DNA of the length of a gene is investigated. Also, one basic aspect of sequencing the human genome is not really solved: the problem how to prepare the huge amounts of DNA required. Approaches from biomedical optics combined with new developments in single molecule biotechnology may at least contribute some parts of the puzzle. A large genome can be partitioned into portions comprising approximately 1% of the whole DNA using a laser microbeam. The single DNA fragment can be amplified by the polymerase chain reaction in order to obtain a sufficient amount of molecules for conventional DNA diagnostics or for analysis by octanucleotide hybridization. When not amplified by biotechnological processes, the individual DNA molecule can be visualized in the light microscope and can be manipulated and dissected with the laser microbeam trap. The DNA probes obtained by single molecule biotechnology can be employed for fluorescence in situ introduced into plant cells and subcellular structures even when other techniques fail. Since the laser microbeam trap allows to work in the interior of a cell without opening it, subcellular structures can be manipulated. For example, in algae, such structures can be moved out of their original position and used to study intracellular viscosities.
Metabolome-genome-wide association study dissects genetic architecture for generating natural variation in rice secondary metabolism

PubMed Central

Matsuda, Fumio; Nakabayashi, Ryo; Yang, Zhigang; Okazaki, Yozo; Yonemaru, Jun-ichi; Ebana, Kaworu; Yano, Masahiro; Saito, Kazuki

2015-01-01

Plants produce structurally diverse secondary (specialized) metabolites to increase their fitness for survival under adverse environments. Several bioactive compounds for new drugs have been identified through screening of plant extracts. In this study, genome-wide association studies (GWAS) were conducted to investigate the genetic architecture behind the natural variation of rice secondary metabolites. GWAS using the metabolome data of 175 rice accessions successfully identified 323 associations among 143 single nucleotide polymorphisms (SNPs) and 89 metabolites. The data analysis highlighted that levels of many metabolites are tightly associated with a small number of strong quantitative trait loci (QTLs). The tight association may be a mechanism generating strains with distinct metabolic composition through the crossing of two different strains. The results indicate that one plant species produces more diverse phytochemicals than previously expected, and plants still contain many useful compounds for human applications. PMID:25267402
Within- and across-breed genomic predictions and genomic relationships for Western Pyrenees dairy sheep breeds Latxa, Manech, and Basco-Béarnaise.

PubMed

Legarra, A; Baloche, G; Barillet, F; Astruc, J M; Soulas, C; Aguerre, X; Arrese, F; Mintegi, L; Lasarte, M; Maeztu, F; Beltrán de Heredia, I; Ugarte, E

2014-05-01

Genotypes, phenotypes and pedigrees of 6 breeds of dairy sheep (including subdivisions of Latxa, Manech, and Basco-Béarnaise) from the Spain and France Western Pyrenees were used to estimate genetic relationships across breeds (together with genotypes from the Lacaune dairy sheep) and to verify by forward cross-validation single-breed or multiple-breed genetic evaluations. The number of rams genotyped fluctuated between 100 and 1,300 but generally represented the 10 last cohorts of progeny-tested rams within each breed. Genetic relationships were assessed by principal components analysis of the genomic relationship matrices and also by the conservation of linkage disequilibrium patterns at given physical distances in the genome. Genomic and pedigree-based evaluations used daughter yield performances of all rams, although some of them were not genotyped. A pseudo-single step method was used in this case for genomic predictions. Results showed a clear structure in blond and black breeds for Manech and Latxa, reflecting historical exchanges, and isolation of Basco-Béarnaise and Lacaune. Relatedness between any 2 breeds was, however, lower than expected. Single-breed genomic predictions had accuracies comparable with other breeds of dairy sheep or small breeds of dairy cattle. They were more accurate than pedigree predictions for 5 out of 6 breeds, with absolute increases in accuracy ranging from 0.05 to 0.30 points. They were significantly better, as assessed by bootstrapping of candidates, for 2 of the breeds. Predictions using multiple populations only marginally increased the accuracy for a couple of breeds. Pooling populations does not increase the accuracy of genomic evaluations in dairy sheep; however, single-breed genomic predictions are more accurate, even for small breeds, and make the consideration of genomic schemes in dairy sheep interesting. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
The dinoflagellates Durinskia baltica and Kryptoperidinium foliaceum retain functionally overlapping mitochondria from two evolutionarily distinct lineages

PubMed Central

Imanian, Behzad; Keeling, Patrick J

2007-01-01

Background The dinoflagellates Durinskia baltica and Kryptoperidinium foliaceum are distinguished by the presence of a tertiary plastid derived from a diatom endosymbiont. The diatom is fully integrated with the host cell cycle and is so altered in structure as to be difficult to recognize it as a diatom, and yet it retains a number of features normally lost in tertiary and secondary endosymbionts, most notably mitochondria. The dinoflagellate host is also reported to retain mitochondrion-like structures, making these cells unique in retaining two evolutionarily distinct mitochondria. This redundancy raises the question of whether the organelles share any functions in common or have distributed functions between them. Results We show that both host and endosymbiont mitochondrial genomes encode genes for electron transport proteins. We have characterized cytochrome c oxidase 1 (cox1), cytochrome oxidase 2 (cox2), cytochrome oxidase 3 (cox3), cytochrome b (cob), and large subunit of ribosomal RNA (LSUrRNA) of endosymbiont mitochondrial ancestry, and cox1 and cob of host mitochondrial ancestry. We show that all genes are transcribed and that those ascribed to the host mitochondrial genome are extensively edited at the RNA level, as expected for a dinoflagellate mitochondrion-encoded gene. We also found evidence for extensive recombination in the host mitochondrial genes and that recombination products are also transcribed, as expected for a dinoflagellate. Conclusion Durinskia baltica and K. foliaceum retain two mitochondria from evolutionarily distinct lineages, and the functions of these organelles are at least partially overlapping, since both express genes for proteins in electron transport. PMID:17892581

Evidence for nonallopatric speciation among closely related sympatric Heliotropium species in the Atacama Desert

PubMed Central

Luebert, Federico; Jacobs, Pit; Hilger, Hartmut H; Muller, Ludo A H

2014-01-01

The genetic structure of populations of closely related, sympatric species may hold the signature of the geographical mode of the speciation process. In fully allopatric speciation, it is expected that genetic differentiation between species is homogeneously distributed across the genome. In nonallopatric speciation, the genomes may remain undifferentiated to a large extent. In this article, we analyzed the genetic structure of five sympatric species from the plant genus Heliotropium in the Atacama Desert. We used amplified fragment length polymorphisms (AFLPs) to characterize the genetic structure of these species and evaluate their genetic differentiation as well as the number of loci subject to positive selection using divergence outlier analysis (DOA). The five species form distinguishable groups in the genetic space, with zones of overlap, indicating that they are possibly not completely isolated. Among-species differentiation accounts for 35% of the total genetic differentiation (FST = 0.35), and FST between species pairs is positively correlated with phylogenetic distance. DOA suggests that few loci are subject to positive selection, which is in line with a scenario of nonallopatric speciation. These results support the idea that sympatric species of Heliotropium sect. Cochranea are under an ongoing speciation process, characterized by a fluctuation of population ranges in response to pulses of arid and humid periods during Quaternary times. PMID:24558582
Evidence for nonallopatric speciation among closely related sympatric Heliotropium species in the Atacama Desert.

PubMed

Luebert, Federico; Jacobs, Pit; Hilger, Hartmut H; Muller, Ludo A H

2014-02-01

The genetic structure of populations of closely related, sympatric species may hold the signature of the geographical mode of the speciation process. In fully allopatric speciation, it is expected that genetic differentiation between species is homogeneously distributed across the genome. In nonallopatric speciation, the genomes may remain undifferentiated to a large extent. In this article, we analyzed the genetic structure of five sympatric species from the plant genus Heliotropium in the Atacama Desert. We used amplified fragment length polymorphisms (AFLPs) to characterize the genetic structure of these species and evaluate their genetic differentiation as well as the number of loci subject to positive selection using divergence outlier analysis (DOA). The five species form distinguishable groups in the genetic space, with zones of overlap, indicating that they are possibly not completely isolated. Among-species differentiation accounts for 35% of the total genetic differentiation (F ST = 0.35), and F ST between species pairs is positively correlated with phylogenetic distance. DOA suggests that few loci are subject to positive selection, which is in line with a scenario of nonallopatric speciation. These results support the idea that sympatric species of Heliotropium sect. Cochranea are under an ongoing speciation process, characterized by a fluctuation of population ranges in response to pulses of arid and humid periods during Quaternary times.
Draft Genome Sequence of Lactobacillus crispatus EM-LC1, an Isolate with Antimicrobial Activity Cultured from an Elderly Subject

PubMed Central

Power, Susan E.; Harris, Hugh M. B.; Bottacini, Francesca; Ross, R. Paul; O’Toole, Paul W.

2013-01-01

Here we report the 1.86-Mb draft genome sequence of Lactobacillus crispatus EM-LC1, a fecal isolate with antimicrobial activity. This genome sequence is expected to provide insights into the antimicrobial activity of L. crispatus and improve our knowledge of its potential probiotic traits. PMID:24356836
De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus.

PubMed

Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V

2017-12-28

Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.
Cell-free protein synthesis for structure determination by X-ray crystallography.

PubMed

Watanabe, Miki; Miyazono, Ken-ichi; Tanokura, Masaru; Sawasaki, Tatsuya; Endo, Yaeta; Kobayashi, Ichizo

2010-01-01

Structure determination has been difficult for those proteins that are toxic to the cells and cannot be prepared in a large amount in vivo. These proteins, even when biologically very interesting, tend to be left uncharacterized in the structural genomics projects. Their cell-free synthesis can bypass the toxicity problem. Among the various cell-free systems, the wheat-germ-based system is of special interest due to the following points: (1) Because the gene is placed under a plant translational signal, its toxic expression in a bacterial host is reduced. (2) It has only little codon preference and, especially, little discrimination between methionine and selenomethionine (SeMet), which allows easy preparation of selenomethionylated proteins for crystal structure determination by SAD and MAD methods. (3) Translation is uncoupled from transcription, so that the toxicity of the translation product on DNA and its transcription, if any, can be bypassed. We have shown that the wheat-germ-based cell-free protein synthesis is useful for X-ray crystallography of one of the 4-bp cutter restriction enzymes, which are expected to be very toxic to all forms of cells retaining the genome. Our report on its structure represents the first report of structure determination by X-ray crystallography using protein overexpressed with the wheat-germ-based cell-free protein expression system. This will be a method of choice for cytotoxic proteins when its cost is not a problem. Its use will become popular when the crystal structure determination technology has evolved to require only a tiny amount of protein.
Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

PubMed

Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

2013-01-01

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Selective forces and mutational biases drive stop codon usage in the human genome: a comparison with sense codon usage.

PubMed

Trotta, Edoardo

2016-05-17

The three stop codons UAA, UAG, and UGA signal the termination of mRNA translation. As a result of a mechanism that is not adequately understood, they are normally used with unequal frequencies. In this work, we showed that selective forces and mutational biases drive stop codon usage in the human genome. We found that, in respect to sense codons, stop codon usage was affected by stronger selective forces but was less influenced by neutral mutational biases. UGA is the most frequent termination codon in human genome. However, UAA was the preferred stop codon in genes with high breadth of expression, high level of expression, AT-rich coding sequences, housekeeping functions, and in gene ontology categories with the largest deviation from expected stop codon usage. Selective forces associated with the breadth and the level of expression favoured AT-rich sequences in the mRNA region including the stop site and its proximal 3'-UTR, but acted with scarce effects on sense codons, generating two regions, upstream and downstream of the stop codon, with strongly different base composition. By favouring low levels of GC-content, selection promoted labile local secondary structures at the stop site and its proximal 3'-UTR. The compositional and structural context favoured by selection was surprisingly emphasized in the class of ribosomal proteins and was consistent with sequence elements that increase the efficiency of translational termination. Stop codons were also heterogeneously distributed among chromosomes by a mechanism that was strongly correlated with the GC-content of coding sequences. In human genome, the nucleotide composition and the thermodynamic stability of stop codon site and its proximal 3'-UTR are correlated with the GC-content of coding sequences and with the breadth and the level of gene expression. In highly expressed genes stop codon usage is compositionally and structurally consistent with highly efficient translation termination signals.
Atlas2 Cloud: a framework for personal genome analysis in the cloud

PubMed Central

2012-01-01

Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663
Atlas2 Cloud: a framework for personal genome analysis in the cloud.

PubMed

Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli

2012-01-01

Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.
Functional and Genomic Analyses of Alpha-Solenoid Proteins

PubMed Central

Fournier, David; Palidwor, Gareth A.; Shcherbinin, Sergey; Szengel, Angelika; Schaefer, Martin H.; Perez-Iratxeta, Carol; Andrade-Navarro, Miguel A.

2013-01-01

Alpha-solenoids are flexible protein structural domains formed by ensembles of alpha-helical repeats (Armadillo and HEAT repeats among others). While homology can be used to detect many of these repeats, some alpha-solenoids have very little sequence homology to proteins of known structure and we expect that many remain undetected. We previously developed a method for detection of alpha-helical repeats based on a neural network trained on a dataset of protein structures. Here we improved the detection algorithm and updated the training dataset using recently solved structures of alpha-solenoids. Unexpectedly, we identified occurrences of alpha-solenoids in solved protein structures that escaped attention, for example within the core of the catalytic subunit of PI3KC. Our results expand the current set of known alpha-solenoids. Application of our tool to the protein universe allowed us to detect their significant enrichment in proteins interacting with many proteins, confirming that alpha-solenoids are generally involved in protein-protein interactions. We then studied the taxonomic distribution of alpha-solenoids to discuss an evolutionary scenario for the emergence of this type of domain, speculating that alpha-solenoids have emerged in multiple taxa in independent events by convergent evolution. We observe a higher rate of alpha-solenoids in eukaryotic genomes and in some prokaryotic families, such as Cyanobacteria and Planctomycetes, which could be associated to increased cellular complexity. The method is available at http://cbdm.mdc-berlin.de/~ard2/. PMID:24278209
A little bit of sex matters for genome evolution in asexual plants.

PubMed

Hojsgaard, Diego; Hörandl, Elvira

2015-01-01

Genome evolution in asexual organisms is theoretically expected to be shaped by various factors: first, hybrid origin, and polyploidy confer a genomic constitution of highly heterozygous genotypes with multiple copies of genes; second, asexuality confers a lack of recombination and variation in populations, which reduces the efficiency of selection against deleterious mutations; hence, the accumulation of mutations and a gradual increase in mutational load (Muller's ratchet) would lead to rapid extinction of asexual lineages; third, allelic sequence divergence is expected to result in rapid divergence of lineages (Meselson effect). Recent transcriptome studies on the asexual polyploid complex Ranunculus auricomus using single-nucleotide polymorphisms confirmed neutral allelic sequence divergence within a short time frame, but rejected a hypothesis of a genome-wide accumulation of mutations in asexuals compared to sexuals, except for a few genes related to reproductive development. We discuss a general model that the observed incidence of facultative sexuality in plants may unmask deleterious mutations with partial dominance and expose them efficiently to purging selection. A little bit of sex may help to avoid genomic decay and extinction.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Simon, Horst D.; Zorn, Manfred D.; Spengler, Sylvia J.

The pace of extraordinary advances in molecular biology has accelerated in the past decade due in large part to discoveries coming from genome projects on human and model organisms. The advances in the genome project so far, happening well ahead of schedule and under budget, have exceeded any dreams by its protagonists, let alone formal expectations. Biologists expect the next phase of the genome project to be even more startling in terms of dramatic breakthroughs in our understanding of human biology, the biology of health and of disease. Only today can biologists begin to envision the necessary experimental, computational andmore » theoretical steps necessary to exploit genome sequence information for its medical impact, its contribution to biotechnology and economic competitiveness, and its ultimate contribution to environmental quality. High performance computing has become one of the critical enabling technologies, which will help to translate this vision of future advances in biology into reality. Biologists are increasingly becoming aware of the potential of high performance computing. The goal of this tutorial is to introduce the exciting new developments in computational biology and genomics to the high performance computing community.« less
Developmental Stability Covaries with Genome-Wide and Single-Locus Heterozygosity in House Sparrows

PubMed Central

Vangestel, Carl; Mergeay, Joachim; Dawson, Deborah A.; Vandomme, Viki; Lens, Luc

2011-01-01

Fluctuating asymmetry (FA), a measure of developmental instability, has been hypothesized to increase with genetic stress. Despite numerous studies providing empirical evidence for associations between FA and genome-wide properties such as multi-locus heterozygosity, support for single-locus effects remains scant. Here we test if, and to what extent, FA co-varies with single- and multilocus markers of genetic diversity in house sparrow (Passer domesticus) populations along an urban gradient. In line with theoretical expectations, FA was inversely correlated with genetic diversity estimated at genome level. However, this relationship was largely driven by variation at a single key locus. Contrary to our expectations, relationships between FA and genetic diversity were not stronger in individuals from urban populations that experience higher nutritional stress. We conclude that loss of genetic diversity adversely affects developmental stability in P. domesticus, and more generally, that the molecular basis of developmental stability may involve complex interactions between local and genome-wide effects. Further study on the relative effects of single-locus and genome-wide effects on the developmental stability of populations with different genetic properties is therefore needed. PMID:21747940
Characteristics of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) rRNA genes of Apis mellifera (Insecta: Hymenoptera): structure, organization, and retrotransposable elements

PubMed Central

Gillespie, J J; Johnston, J S; Cannone, J J; Gutell, R R

2006-01-01

As an accompanying manuscript to the release of the honey bee genome, we report the entire sequence of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) ribosomal RNA (rRNA)-encoding gene sequences (rDNA) and related internally and externally transcribed spacer regions of Apis mellifera (Insecta: Hymenoptera: Apocrita). Additionally, we predict secondary structures for the mature rRNA molecules based on comparative sequence analyses with other arthropod taxa and reference to recently published crystal structures of the ribosome. In general, the structures of honey bee rRNAs are in agreement with previously predicted rRNA models from other arthropods in core regions of the rRNA, with little additional expansion in non-conserved regions. Our multiple sequence alignments are made available on several public databases and provide a preliminary establishment of a global structural model of all rRNAs from the insects. Additionally, we provide conserved stretches of sequences flanking the rDNA cistrons that comprise the externally transcribed spacer regions (ETS) and part of the intergenic spacer region (IGS), including several repetitive motifs. Finally, we report the occurrence of retrotransposition in the nuclear large subunit rDNA, as R2 elements are present in the usual insertion points found in other arthropods. Interestingly, functional R1 elements usually present in the genomes of insects were not detected in the honey bee rRNA genes. The reverse transcriptase products of the R2 elements are deduced from their putative open reading frames and structurally aligned with those from another hymenopteran insect, the jewel wasp Nasonia (Pteromalidae). Stretches of conserved amino acids shared between Apis and Nasonia are illustrated and serve as potential sites for primer design, as target amplicons within these R2 elements may serve as novel phylogenetic markers for Hymenoptera. Given the impending completion of the sequencing of the Nasonia genome, we expect our report eventually to shed light on the evolution of the hymenopteran genome within higher insects, particularly regarding the relative maintenance of conserved rDNA genes, related variable spacer regions and retrotransposable elements. PMID:17069639
Global Organization of a Positive-strand RNA Virus Genome

PubMed Central

Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

2013-01-01

The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
Low-pass sequencing for microbial comparative genomics

PubMed Central

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
Characterization of hemizygous deletions in Citrus using array-Comparative Genomic Hybridization and microsynteny comparisons with the poplar genome

PubMed Central

Ríos, Gabino; Naranjo, Miguel A; Iglesias, Domingo J; Ruiz-Rivero, Omar; Geraud, Marion; Usach, Antonio; Talón, Manuel

2008-01-01

Background Many fruit-tree species, including relevant Citrus spp varieties exhibit a reproductive biology that impairs breeding and strongly constrains genetic improvements. In citrus, juvenility increases the generation time while sexual sterility, inbreeding depression and self-incompatibility prevent the production of homozygous cultivars. Genomic technology may provide citrus researchers with a new set of tools to address these various restrictions. In this work, we report a valuable genomics-based protocol for the structural analysis of deletion mutations on an heterozygous background. Results Two independent fast neutron mutants of self-incompatible clementine (Citrus clementina Hort. Ex Tan. cv. Clemenules) were the subject of the study. Both mutants, named 39B3 and 39E7, were expected to carry DNA deletions in hemizygous dosage. Array-based Comparative Genomic Hybridization (array-CGH) using a Citrus cDNA microarray allowed the identification of underrepresented genes in these two mutants. Subsequent comparison of citrus deleted genes with annotated plant genomes, especially poplar, made possible to predict the presence of a large deletion in 39B3 of about 700 kb and at least two deletions of approximately 100 and 500 kb in 39E7. The deletion in 39B3 was further characterized by PCR on available Citrus BACs, which helped us to build a partial physical map of the deletion. Among the deleted genes, ClpC-like gene coding for a putative subunit of a multifunctional chloroplastic protease involved in the regulation of chlorophyll b synthesis was directly related to the mutated phenotype since the mutant showed a reduced chlorophyll a/b ratio in green tissues. Conclusion In this work, we report the use of array-CGH for the successful identification of genes included in a hemizygous deletion induced by fast neutron irradiation on Citrus clementina. The study of gene content and order into the 39B3 deletion also led to the unexpected conclusion that microsynteny and local gene colinearity in this species were higher with Populus trichocarpa than with the phylogenetically closer Arabidopsis thaliana. This work corroborates the potential of Citrus genomic resources to assist mutagenesis-based approaches for functional genetics, structural studies and comparative genomics, and hence to facilitate citrus variety improvement. PMID:18691431
Genomic adaptation of admixed dairy cattle in East Africa

PubMed Central

Kim, Eui-Soo; Rothschild, Max F.

2014-01-01

Dairy cattle in East Africa imported from the U.S. and Europe have been adapted to new environments. In small local farms, cattle have generally been maintained by crossbreeding that could increase survivability under a severe environment. Eventually, genomic ancestry of a specific breed will be nearly fixed in genomic regions of local breeds or crossbreds when it is advantageous for survival or production in harsh environments. To examine this situation, 25 Friesians and 162 local cattle produced by crossbreeding of dairy breeds in Kenya were sampled and genotyped using 50K SNPs. Using principal component analysis (PCA), the admixed local cattle were found to consist of several imported breeds, including Guernsey, Norwegian Red, and Holstein. To infer the influence of parental breeds on genomic regions, local ancestry mapping was performed based on the similarity of haplotypes. As a consequence, it appears that no genomic region has been under the complete influence of a specific parental breed. Nonetheless, the ancestry of Holstein-Friesians was substantial in most genomic regions (>80%). Furthermore, we examined the frequency of the most common haplotypes from parental breeds that have changed substantially in Kenyan crossbreds during admixture. The frequency of these haplotypes from parental breeds, which were likely to be selected in temperate regions, has deviated considerably from expected frequency in 11 genomic regions. Additionally, extended haplotype homozygosity (EHH) based methods were applied to identify the regions responding to recent selection in crossbreds, called candidate regions, resulting in seven regions that appeared to be affected by Holstein-Friesians. However, some signatures of selection were less dependent on Holsteins-Friesians, suggesting evidence of adaptation in East Africa. The analysis of local ancestry is a useful approach to understand the detailed genomic structure and may reveal regions of the genome required for specialized adaptation when combined with methods for searching for the recent changes of haplotype frequency in an admixed population. PMID:25566325
Primer on Molecular Genetics; DOE Human Genome Program

DOE R&D Accomplishments Database

1992-04-01

This report is taken from the April 1992 draft of the DOE Human Genome 1991--1992 Program Report, which is expected to be published in May 1992. The primer is intended to be an introduction to basic principles of molecular genetics pertaining to the genome project. The material contained herein is not final and may be incomplete. Techniques of genetic mapping and DNA sequencing are described.
A survey of genes encoding H2O2-producing GMC oxidoreductases in 10 Polyporales genomes.

PubMed

Ferreira, Patricia; Carro, Juan; Serrano, Ana; Martínez, Angel T

2015-01-01

The genomes of three representative Polyporales (Bjerkandera adusta, Phlebia brevispora and a member of the Ganoderma lucidum complex) recently were sequenced to expand our knowledge on the diversity and distribution of genes involved in degradation of plant polymers in this Basidiomycota order, which includes most wood-rotting fungi. Oxidases, including members of the glucose-methanol-choline (GMC) oxidoreductase superfamily, play a central role in the above degradative process because they generate extracellular H2O2 acting as the ultimate oxidizer in both white-rot and brown-rot decay. The survey was completed by analyzing the GMC genes in the available genomes of seven more species to cover the four Polyporales clades. First, an in silico search for sequences encoding members of the aryl-alcohol oxidase, glucose oxidase, methanol oxidase, pyranose oxidase, cellobiose dehydrogenase and pyranose dehydrogenase families was performed. The curated sequences were subjected to an analysis of their evolutionary relationships, followed by estimation of gene duplication/reduction history during fungal evolution. Second, the molecular structures of the near one hundred GMC oxidoreductases identified were modeled to gain insight into their structural variation and expected catalytic properties. In contrast to ligninolytic peroxidases, whose genes are present in all white-rot Polyporales genomes and absent from those of brown-rot species, the H2O2-generating oxidases are widely distributed in both fungal types. This indicates that the GMC oxidases provide H2O2 for both ligninolytic peroxidase activity (in white-rot decay) and Fenton attack on cellulose (in brown-rot decay), after the transition between both decay patterns in Polyporales occurred. © 2015 by The Mycological Society of America.

Lineage-specific expansions of retroviral insertions within the genomes of African great apes but not humans and orangutans.

PubMed

Yohn, Chris T; Jiang, Zhaoshi; McGrath, Sean D; Hayden, Karen E; Khaitovich, Philipp; Johnson, Matthew E; Eichler, Marla Y; McPherson, John D; Zhao, Shaying; Pääbo, Svante; Eichler, Evan E

2005-04-01

Retroviral infections of the germline have the potential to episodically alter gene function and genome structure during the course of evolution. Horizontal transmissions between species have been proposed, but little evidence exists for such events in the human/great ape lineage of evolution. Based on analysis of finished BAC chimpanzee genome sequence, we characterize a retroviral element (Pan troglodytes endogenous retrovirus 1 [PTERV1]) that has become integrated in the germline of African great ape and Old World monkey species but is absent from humans and Asian ape genomes. We unambiguously map 287 retroviral integration sites and determine that approximately 95.8% of the insertions occur at non-orthologous regions between closely related species. Phylogenetic analysis of the endogenous retrovirus reveals that the gorilla and chimpanzee elements share a monophyletic origin with a subset of the Old World monkey retroviral elements, but that the average sequence divergence exceeds neutral expectation for a strictly nuclear inherited DNA molecule. Within the chimpanzee, there is a significant integration bias against genes, with only 14 of these insertions mapping within intronic regions. Six out of ten of these genes, for which there are expression data, show significant differences in transcript expression between human and chimpanzee. Our data are consistent with a retroviral infection that bombarded the genomes of chimpanzees and gorillas independently and concurrently, 3-4 million years ago. We speculate on the potential impact of such recent events on the evolution of humans and great apes.
Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia

PubMed Central

Bhattacharya, Sanchita; Li, Jian; Sockell, Alexandra; Kan, Matthew J.; Bava, Felice A.; Chen, Shann-Ching; Ávila-Arcos, María C.; Ji, Xuhuai; Smith, Emery; Asadi, Narges B.; Lachman, Ralph S.; Lam, Hugo Y.K.; Bustamante, Carlos D.; Butte, Atul J.; Nolan, Garry P.

2018-01-01

Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype—6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age—leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6–8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes (COL1A1, COL2A1, KMT2D, FLNB, ATR, TRIP11, PCNT) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification. PMID:29567674
Characterization of polymorphic microsatellites for Tripterygium (Celastraceae), a monospecific genus of medicinal importance.

PubMed

Novy, Ari; Jones, Kenneth C

2011-10-01

Microsatellite markers were developed for the medicinal plant Tripterygium (Celastraceae) to assess its population structure and to facilitate source tracking of plant materials used for medicinal extracts. Ten microsatellite markers were isolated and characterized in T. wilfordii using an enriched genomic library. The number of alleles per locus ranged from five to 12. Observed and expected heterozygosity ranged from 0.166 to 0.630 and 0.392 to 0.562, respectively. These markers will be useful for a variety of applications including source tracking of plant materials, resolution of taxonomic issues, and population genetics studies.
Structural Genomics: Correlation Blocks, Population Structure, and Genome Architecture

PubMed Central

Hu, Xin-Sheng; Yeh, Francis C.; Wang, Zhiquan

2011-01-01

An integration of the pattern of genome-wide inter-site associations with evolutionary forces is important for gaining insights into the genomic evolution in natural or artificial populations. Here, we assess the inter-site correlation blocks and their distributions along chromosomes. A correlation block is broadly termed as the DNA segment within which strong correlations exist between genetic diversities at any two sites. We bring together the population genetic structure and the genomic diversity structure that have been independently built on different scales and synthesize the existing theories and methods for characterizing genomic structure at the population level. We discuss how population structure could shape correlation blocks and their patterns within and between populations. Effects of evolutionary forces (selection, migration, genetic drift, and mutation) on the pattern of genome-wide correlation blocks are discussed. In eukaryote organisms, we briefly discuss the associations between the pattern of correlation blocks and genome assembly features in eukaryote organisms, including the impacts of multigene family, the perturbation of transposable elements, and the repetitive nongenic sequences and GC-rich isochores. Our reviews suggest that the observable pattern of correlation blocks can refine our understanding of the ecological and evolutionary processes underlying the genomic evolution at the population level. PMID:21886455
Stepwise identification of HLA-A*0201-restricted CD8+ T-cell epitope peptides from herpes simplex virus type 1 genome boosted by a StepRank scheme.

PubMed

Bi, Jianjun; Song, Rengang; Yang, Huilan; Li, Bingling; Fan, Jianyong; Liu, Zhongrong; Long, Chaoqin

2011-01-01

Identification of immunodominant epitopes is the first step in the rational design of peptide vaccines aimed at T-cell immunity. To date, however, it is yet a great challenge for accurately predicting the potent epitope peptides from a pool of large-scale candidates with an efficient manner. In this study, a method that we named StepRank has been developed for the reliable and rapid prediction of binding capabilities/affinities between proteins and genome-wide peptides. In this procedure, instead of single strategy used in most traditional epitope identification algorithms, four steps with different purposes and thus different computational demands are employed in turn to screen the large-scale peptide candidates that are normally generated from, for example, pathogenic genome. The steps 1 and 2 aim at qualitative exclusion of typical nonbinders by using empirical rule and linear statistical approach, while the steps 3 and 4 focus on quantitative examination and prediction of the interaction energy profile and binding affinity of peptide to target protein via quantitative structure-activity relationship (QSAR) and structure-based free energy analysis. We exemplify this method through its application to binding predictions of the peptide segments derived from the 76 known open-reading frames (ORFs) of herpes simplex virus type 1 (HSV-1) genome with or without affinity to human major histocompatibility complex class I (MHC I) molecule HLA-A*0201, and find that the predictive results are well compatible with the classical anchor residue theory and perfectly match for the extended motif pattern of MHC I-binding peptides. The putative epitopes are further confirmed by comparisons with 11 experimentally measured HLA-A*0201-restrcited peptides from the HSV-1 glycoproteins D and K. We expect that this well-designed scheme can be applied in the computational screening of other viral genomes as well.
Signatures of Long-Term Balancing Selection in Human Genomes

PubMed Central

de Filippo, Cesare; Teixeira, João C; Schmidt, Joshua M; Kleinert, Philip; Meyer, Diogo; Andrés, Aida M

2018-01-01

Abstract Balancing selection maintains advantageous diversity in populations through various mechanisms. Although extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here, we describe the Non-central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) that quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to a single locus or genomic data, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences with respect to an outgroup (NCD2) species. Incorporating fixed differences improves power, and NCD2 has higher power to detect LTBS in humans under different frequencies of the balanced allele(s) than other available methods. Applied to genome-wide data from African and European human populations, in both cases using chimpanzee as an outgroup, NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: ∼0.6% of analyzed genomic windows and 0.8% of analyzed positions. Significant windows (P < 0.0001) contain 1.6% of SNPs in the genome, which disproportionally fall within exons and change protein sequence, but are not enriched in putatively regulatory sites. These windows overlap ∼8% of the protein-coding genes, and these have larger number of transcripts than expected by chance even after controlling for gene length. Our catalog includes known targets of LTBS but a majority of them (90%) are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes. PMID:29608730
Charting the map of life.

PubMed Central

Schmidt, C W

2001-01-01

Scientists expect that mapping the human genome will lead to a host of innovations in biology and research. For example, it may become possible to use DNA microarrays to accurately diagnose cancer and infectious disease subtypes and to predict clinical outcomes. Scientists might also use the genome to look at the interactions of the environment, genetic makeup, and toxic exposures, including the ability of certain beneficial genes to detoxify the body and resist disease. But despite the great potential of the field of genomics, scientists caution that public expectations need to be tempered by reality. People are as much a product of their environment as they are of their genes, say experts, and to suggest that genetics is the sole determinant that defines humans as individuals stretches the science beyond the current data. PMID:11171541
Genomic resources and genetic diversity of captive lesser kudu (Tragelaphus imberbis).

PubMed

Bock, Friederike; Gallus, Susanne; Janke, Axel; Hailer, Frank; Steck, Beatrice L; Kumar, Vikas; Nilsson, Maria A

2014-01-01

The lesser kudu (Tragelaphus imberbis) is a spiral-horned antelope native to northeastern Africa. Individuals kept in zoological gardens are suspected to be highly inbred due to few founder individuals and a small breeding stock. A morphological study suggested two distinct subspecies of the lesser kudu. However, subspecies designation and population structure in zoological gardens has not been analyzed using molecular markers. We analyzed one mitochondrial marker and two nuclear intron loci (total: 2,239 nucleotides) in 52 lesser kudu individuals. Of these, 48 individuals were bred in captivity and sampled from seven different zoos. The four remaining individuals were recently captured in Somalia and are currently held in the Maktoum zoo. Maternally inherited mitochondrial sequences indicate substantial amounts of genetic variation in the zoo populations, while the biparentally inherited intron sequences are, as expected, less variable. The analyzed individuals show 10 mitochondrial haplotypes with a maximal distance of 10 mutational steps. No prominent subspecies structure is detectable in this study. For further studies of the lesser kudu population genetics, we present microsatellite markers from a low-coverage genome survey using 454 sequencing technology. © 2014 Wiley Periodicals, Inc.
RAD SNP markers as a tool for conservation of dolphinfish Coryphaena hippurus in the Mediterranean Sea: Identification of subtle genetic structure and assessment of populations sex-ratios.

PubMed

Maroso, Francesco; Franch, Rafaella; Dalla Rovere, Giulia; Arculeo, Marco; Bargelloni, Luca

2016-08-01

Dolphinfish is an important fish species for both commercial and sport fishing, but so far limited information is available on genetic variability and pattern of differentiation of dolphinfish populations in the Mediterranean basin. Recently developed techniques allow genome-wide identification of genetic markers for better understanding of population structure in species with limited genome information. Using restriction-site associated DNA analysis we successfully genotyped 140 individuals of dolphinfish from eight locations in the Mediterranean Sea at 3324 SNP loci. We identified 311 sex-related loci that were used to assess sex-ratio in dolphinfish populations. In addition, we identified a weak signature of genetic differentiation of the population closer to Gibraltar Strait in comparison to other Mediterranean populations, which might be related to introgression of individuals from Atlantic. No further genetic differentiation could be detected in the other populations sampled, as expected considering the known highly mobility of the species. The results obtained improve our knowledge of the species and can help managing dolphinfish stock in the future. Copyright © 2016 Elsevier B.V. All rights reserved.
Sequence co-evolution gives 3D contacts and structures of protein complexes

PubMed Central

Hopf, Thomas A; Schärfe, Charlotta P I; Rodrigues, João P G L M; Green, Anna G; Kohlbacher, Oliver; Sander, Chris; Bonvin, Alexandre M J J; Marks, Debora S

2014-01-01

Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DOI: http://dx.doi.org/10.7554/eLife.03430.001 PMID:25255213
Challenges in NMR-based structural genomics

NASA Astrophysics Data System (ADS)

Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang

2005-05-01

Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.
Genome-wide distribution of genetic diversity and linkage disequilibrium in a mass-selected population of maritime pine

PubMed Central

2014-01-01

Background The accessibility of high-throughput genotyping technologies has contributed greatly to the development of genomic resources in non-model organisms. High-density genotyping arrays have only recently been developed for some economically important species such as conifers. The potential for using genomic technologies in association mapping and breeding depends largely on the genome wide patterns of diversity and linkage disequilibrium in current breeding populations. This study aims to deepen our knowledge regarding these issues in maritime pine, the first species used for reforestation in south western Europe. Results Using a new map merging algorithm, we first established a 1,712 cM composite linkage map (comprising 1,838 SNP markers in 12 linkage groups) by bringing together three already available genetic maps. Using rigorous statistical testing based on kernel density estimation and resampling we identified cold and hot spots of recombination. In parallel, 186 unrelated trees of a mass-selected population were genotyped using a 12k-SNP array. A total of 2,600 informative SNPs allowed to describe historical recombination, genetic diversity and genetic structure of this recently domesticated breeding pool that forms the basis of much of the current and future breeding of this species. We observe very low levels of population genetic structure and find no evidence that artificial selection has caused a reduction in genetic diversity. By combining these two pieces of information, we provided the map position of 1,671 SNPs corresponding to 1,192 different loci. This made it possible to analyze the spatial pattern of genetic diversity (H e ) and long distance linkage disequilibrium (LD) along the chromosomes. We found no particular pattern in the empirical variogram of H e across the 12 linkage groups and, as expected for an outcrossing species with large effective population size, we observed an almost complete lack of long distance LD. Conclusions These results are a stepping stone for the development of strategies for studies in population genomics, association mapping and genomic prediction in this economical and ecologically important forest tree species. PMID:24581176
Optimized guide RNA structure for genome editing via Cas9

PubMed Central

Xu, Jianyong; Lian, Wei; Jia, Yuning; Li, Lingyun; Huang, Zhong

2017-01-01

The genome editing tool Cas9-gRNA (guide RNA) has been successfully applied in different cell types and organisms with high efficiency. However, more efforts need to be made to enhance both efficiency and specificity. In the current study, we optimized the guide RNA structure of Streptococcus pyogenes CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats)/Cas (CRISPR-associated) system to improve its genome editing efficiency. Comparing with the original functional structure of guide RNA, which is composed of crRNA and tracrRNA, the widely used chimeric gRNA has shorter crRNA and tracrRNA sequence. The deleted RNA sequence could form extra loop structure, which might enhance the stability of the guide RNA structure and subsequently the genome editing efficiency. Thus the genome editing efficiency of different forms of guide RNA was tested. And we found that the chimeric structure of gRNA with original full length of crRNA and tracrRNA showed higher genome editing efficiency than the conventional chimeric structure or other types of gRNA we tested. Therefore our data here uncovered the new type of gRNA structure with higher genome editing efficiency. PMID:29212218
The ‘thousand-dollar genome': an ethical exploration

PubMed Central

Dondorp, Wybo J; de Wert, Guido M W R

2013-01-01

Sequencing an individual's complete genome is expected to be possible for a relatively low sum ‘one thousand dollars' within a few years. Sequencing refers to determining the order of base pairs that make up the genome. The result is a library of three billion letter combinations. Cheap whole-genome sequencing is of greatest importance to medical scientific research. Comparing individual complete genomes will lead to a better understanding of the contribution genetic variation makes to health and disease. As knowledge increases, the ‘thousand-dollar genome' will also become increasingly important to healthcare. The applications that come within reach raise a number of ethical questions. This monitoring report addresses the issue. PMID:23677179
Producing genome structure populations with the dynamic and automated PGS software.

PubMed

Hua, Nan; Tjong, Harianto; Shin, Hanjun; Gong, Ke; Zhou, Xianghong Jasmine; Alber, Frank

2018-05-01

Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.
The impact of the Tekay chromoviral elements on genome organisation and evolution of Anemone s.l. (Ranunculaceae).

PubMed

Mlinarec, J; Franjević, D; Harapin, J; Besendorfer, V

2016-03-01

We studied the highly abundant chromoviral Tekay clade in species from three sister genera - Anemone, Pulsatilla and Hepatica (Ranunculaceae). With this clade, we performed a concomitant survey of its phylogenetic diversity, chromosomal organisation and transcriptional activity in Anemone s.l. in order to investigate dynamics of the Tekay elements at a finer scale than previously achieved in this or any other flowering clade. The phylogenetic tree built from Tekay sequences conformed to expected evolutionary relationships of the species; exceptions being A. nemorosa and A. sylvestris, which appeared more closely related that expected, and we invoke hybridisation events to explain the observed topology. The separation of elements into six clusters could be explained by episodic bursts of activity since divergence from a common ancestor at different points in their respective evolutionary histories. In Anemone s.l. the Tekay elements do not have a preferential position on chromosomes, i.e. they can have a: (i) centromeric/pericentromeric position; (ii) interstitial position in DAPI-positive AT-rich heterochromatic regions; can be (iii) dispersed throughout chromosomes; or even (iv) be absent from large heterochromatic blocks. Widespread transcriptional activity of the Tekay elements in Anemone s.l. taxa indicate that some copies of Tekay elements could still be active in this plant group, contributing to genome evolution and speciation within Anemone s.l. Identification of Tekay elements in Anemone s.l. provides valuable information for understanding how different localisation patterns might help to facilitate plant genome organisation in a structural and functional manner. © 2015 German Botanical Society and The Royal Botanical Society of the Netherlands.
Selective intra-dinucleotide interactions and periodicities of bases separated by K sites: a new vision and tool for phylogeny analyses.

PubMed

Valenzuela, Carlos Y

2017-02-13

Direct tests of the random or non-random distribution of nucleotides on genomes have been devised to test the hypothesis of neutral, nearly-neutral or selective evolution. These tests are based on the direct base distribution and are independent of the functional (coding or non-coding) or structural (repeated or unique sequences) properties of the DNA. The first approach described the longitudinal distribution of bases in tandem repeats under the Bose-Einstein statistics. A huge deviation from randomness was found. A second approach was the study of the base distribution within dinucleotides whose bases were separated by 0, 1, 2… K nucleotides. Again an enormous difference from the random distribution was found with significances out of tables and programs. These test values were periodical and included the 16 dinucleotides. For example a high "positive" (more observed than expected dinucleotides) value, found in dinucleotides whose bases were separated by (3K + 2) sites, was preceded by two smaller "negative" (less observed than expected dinucleotides) values, whose bases were separated by (3K) or (3K + 1) sites. We examined mtDNAs, prokaryote genomes and some eukaryote chromosomes and found that the significant non-random interactions and periodicities were present up to 1000 or more sites of base separation and in human chromosome 21 until separations of more than 10 millions sites. Each nucleotide has its own significant value of its distance to neutrality; this yields 16 hierarchical significances. A three dimensional table with the number of sites of separation between the bases and the 16 significances (the third dimension is the dinucleotide, individual or taxon involved) gives directly an evolutionary state of the analyzed genome that can be used to obtain phylogenies. An example is provided.
A map of human microRNA variation uncovers unexpectedly high levels of variability

PubMed Central

2012-01-01

Background MicroRNAs (miRNAs) are key components of the gene regulatory network in many species. During the past few years, these regulatory elements have been shown to be involved in an increasing number and range of diseases. Consequently, the compilation of a comprehensive map of natural variability in a healthy population seems an obvious requirement for future research on miRNA-related pathologies. Methods Data on 14 populations from the 1000 Genomes Project were analyzed, along with new data extracted from 60 exomes of healthy individuals from a population from southern Spain, sequenced in the context of the Medical Genome Project, to derive an accurate map of miRNA variability. Results Despite the common belief that miRNAs are highly conserved elements, analysis of the sequences of the 1,152 individuals indicated that the observed level of variability is double what was expected. A total of 527 variants were found. Among these, 45 variants affected the recognition region of the corresponding miRNA and were found in 43 different miRNAs, 26 of which are known to be involved in 57 diseases. Different parts of the mature structure of the miRNA were affected to different degrees by variants, which suggests the existence of a selective pressure related to the relative functional impact of the change. Moreover, 41 variants showed a significant deviation from the Hardy-Weinberg equilibrium, which supports the existence of a selective process against some alleles. The average number of variants per individual in miRNAs was 28. Conclusions Despite an expectation that miRNAs would be highly conserved genomic elements, our study reports a level of variability comparable to that observed for coding genes. PMID:22906193
Genomic selection in forage breeding: designing an estimation population

USDA-ARS?s Scientific Manuscript database

The benefits of genomic selection to livestock, crops and forest tree breeding can be extended to forage grasses and legumes. The main benefits expected are increased selection accuracy and reduced costs per unit of genotype evaluated and breeding cycle length. Aiming at designing a training populat...
Isolation and characterization of polymorphic microsatellite loci in Spondias radlkoferi (Anacardiaceae)1

PubMed Central

Aguilar-Barajas, Esther; Sork, Victoria L.; González-Zamora, Arturo; Rocha-Ramírez, Víctor; Arroyo-Rodríguez, Víctor; Oyama, Ken

2014-01-01

• Premise of the study: Microsatellite markers were developed for Spondias radlkoferi to assess the impact of primate seed dispersal on the genetic diversity and structure of this important tree species of Anacardiaceae. • Methods and Results: Fourteen polymorphic loci were isolated from S. radlkoferi through 454 GS-FLX Titanium pyrosequencing of genomic DNA. The number of alleles ranged from three to 12. The observed and expected heterozygosities ranged from 0.382 to 1.00 and from 0.353 to 0.733, respectively. The amplification was also successful in S. mombin and two genera of Anacardiaceae: Rhus aromatica and Toxicodendron radicans. • Conclusions: These microsatellite loci will be useful to assess the genetic diversity and population structure of S. radlkoferi and related species, and will allow us to investigate the effects of seed dispersal by spider monkeys (Ateles geoffroyi) on the genetic structure and diversity of S. radlkoferi populations in a fragmented rainforest. PMID:25383270

G2S: a web-service for annotating genomic variants on 3D protein structures.

PubMed

Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

2018-06-01

Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that supports programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online.
Individualized pain medicine

PubMed Central

Kim, Hyungsuk; Dionne, Raymond A.

2010-01-01

Since the first draft of the human genome was published 10 years ago, scientists have tried to develop new treatment strategies for various types of diseases based on individual genomes. It is called personalized (or individualized) medicine and is expected to increase efficacy and reduce adverse reactions of drugs. Much progress has been made with newly developed technologies, though individualized pain medicine is still far from realization. Efforts on the integrative genomic analyses along with understandings of interactions between other related factors such as environment will eventually translate complex genomic information into individualized pain medicine. PMID:21399745
Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

PubMed

Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

2015-01-01

Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.
Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia.

PubMed

Shortt, Jonathan A; Card, Daren C; Schield, Drew R; Liu, Yang; Zhong, Bo; Castoe, Todd A; Carlton, Elizabeth J; Pollock, David D

2017-01-01

In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes.
Complete Genome Sequence of the Alfalfa Symbiont Sinorhizobium/Ensifer meliloti Strain GR4.

PubMed

Martínez-Abarca, Francisco; Martínez-Rodríguez, Laura; López-Contreras, José Antonio; Jiménez-Zurdo, José Ignacio; Toro, Nicolás

2013-01-01

We present the complete nucleotide sequence of the multipartite genome of Sinorhizobium/Ensifer meliloti GR4, a predominant rhizobial strain in an agricultural field site. The genome (total size, 7.14 Mb) consists of five replicons: one chromosome, two expected symbiotic megaplasmids (pRmeGR4c and pRmeGR4d), and two accessory plasmids (pRmeGR4a and pRmeGR4b).
Population Genetic Structure of the Cayo Santiago Colony of Rhesus Macaques (Macaca mulatta).

PubMed

Kanthaswamy, Sreetharan; Oldt, Robert F; Ng, Jillian; Ruiz-Lambides, Angelina V; Maldonado, Elizabeth; Martínez, Melween I; Sariol, Carlos A

2017-07-01

The rhesus macaque population at Cayo Santiago increases annually and is in urgent need of control. In-depth assessments of the colony's population genetic and pedigree structures provide a starting point for improving the colony's long-term management program. We evaluated the degree of genetic variation and coefficients of inbreeding and kinship of the Cayo Santiago colony by using pedigree and short tandem repeat (STR) data from 4738 rhesus macaques, which represent 7 extant social groups and a group of migrant males. Information on each animal's parentage, sex, birth date, and date of death or removal from the island were used to generate estimates of mean kinship, kinship value, gene value, genome uniqueness (GU), founder equivalents (fe), and founder genome equivalents (fg). Pedigree and STR analyses revealed that the social groups have not differentiated genetically from each other due to male-mediated gene flow (that is, FST estimates were in the negative range) and exhibit sufficient genetic variation, with mean estimates of allele numbers and observed and expected heterozygosity of 6.57, 0.72, and 0.70, respectively. Estimates of GU, fe, and fg show that a high effective number of founders has affected the colony's current genetic structure in a positive manner. As demographic changes occur, genetic and pedigree matrices need to be monitored consistently to ensure the health and wellbeing of the Cayo Santiago colony.
MycoCosm, an Integrated Fungal Genomics Resource

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shabalov, Igor; Grigoriev, Igor

2012-03-16

MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/monthmore » or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.« less
Teaching strategies to incorporate genomics education into academic nursing curricula.

PubMed

Quevedo Garcia, Sylvia P; Greco, Karen E; Loescher, Lois J

2011-11-01

The translation of genomic science into health care has expanded our ability to understand the effects of genomics on human health and disease. As genomic advances continue, nurses are expected to have the knowledge and skills to translate genomic information into improved patient care. This integrative review describes strategies used to teach genomics in academic nursing programs and their facilitators and barriers to inclusion in nursing curricula. The Learning Engagement Model and the Diffusion of Innovations Theory guided the interpretation of findings. CINAHL, Medline, and Web of Science were resources for articles published during the past decade that included strategies for teaching genomics in academic nursing programs. Of 135 articles, 13 met criteria for review. Examples of effective genomics teaching strategies included clinical application through case studies, storytelling, online genomics resources, student self-assessment, guest lecturers, and a genetics focus group. Most strategies were not evaluated for effectiveness. Copyright 2011, SLACK Incorporated.
The Paris-Sud yeast structural genomics pilot-project: from structure to function.

PubMed

Quevillon-Cheruel, Sophie; Liger, Dominique; Leulliot, Nicolas; Graille, Marc; Poupon, Anne; Li de La Sierra-Gallay, Inès; Zhou, Cong-Zhao; Collinet, Bruno; Janin, Joël; Van Tilbeurgh, Herman

2004-01-01

We present here the outlines and results from our yeast structural genomics (YSG) pilot-project. A lab-scale platform for the systematic production and structure determination is presented. In order to validate this approach, 250 non-membrane proteins of unknown structure were targeted. Strategies and final statistics are evaluated. We finally discuss the opportunity of structural genomics programs to contribute to functional biochemical annotation.
Genetic diversity and population structure among six cattle breeds in South Africa using a whole genome SNP panel

PubMed Central

Makina, Sithembile O.; Muchadeyi, Farai C.; van Marle-Köster, Este; MacNeil, Michael D.; Maiwashe, Azwihangwisi

2014-01-01

Information about genetic diversity and population structure among cattle breeds is essential for genetic improvement, understanding of environmental adaptation as well as utilization and conservation of cattle breeds. This study investigated genetic diversity and the population structure among six cattle breeds in South African (SA) including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31), and Holstein (n = 29). Genetic diversity within cattle breeds was analyzed using three measures of genetic diversity namely allelic richness (AR), expected heterozygosity (He) and inbreeding coefficient (f). Genetic distances between breed pairs were evaluated using Nei's genetic distance. Population structure was assessed using model-based clustering (ADMIXTURE). Results of this study revealed that the allelic richness ranged from 1.88 (Afrikaner) to 1.73 (Nguni). Afrikaner cattle had the lowest level of genetic diversity (He = 0.24) and the Drakensberger cattle (He = 0.30) had the highest level of genetic variation among indigenous and locally-developed cattle breeds. The level of inbreeding was lower across the studied cattle breeds. As expected the average genetic distance was the greatest between indigenous cattle breeds and Bos taurus cattle breeds but the lowest among indigenous and locally-developed breeds. Model-based clustering revealed some level of admixture among indigenous and locally-developed breeds and supported the clustering of the breeds according to their history of origin. The results of this study provided useful insight regarding genetic structure of SA cattle breeds. PMID:25295053
Genetic diversity and population structure among six cattle breeds in South Africa using a whole genome SNP panel.

PubMed

Makina, Sithembile O; Muchadeyi, Farai C; van Marle-Köster, Este; MacNeil, Michael D; Maiwashe, Azwihangwisi

2014-01-01

Information about genetic diversity and population structure among cattle breeds is essential for genetic improvement, understanding of environmental adaptation as well as utilization and conservation of cattle breeds. This study investigated genetic diversity and the population structure among six cattle breeds in South African (SA) including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31), and Holstein (n = 29). Genetic diversity within cattle breeds was analyzed using three measures of genetic diversity namely allelic richness (AR), expected heterozygosity (He) and inbreeding coefficient (f). Genetic distances between breed pairs were evaluated using Nei's genetic distance. Population structure was assessed using model-based clustering (ADMIXTURE). Results of this study revealed that the allelic richness ranged from 1.88 (Afrikaner) to 1.73 (Nguni). Afrikaner cattle had the lowest level of genetic diversity (He = 0.24) and the Drakensberger cattle (He = 0.30) had the highest level of genetic variation among indigenous and locally-developed cattle breeds. The level of inbreeding was lower across the studied cattle breeds. As expected the average genetic distance was the greatest between indigenous cattle breeds and Bos taurus cattle breeds but the lowest among indigenous and locally-developed breeds. Model-based clustering revealed some level of admixture among indigenous and locally-developed breeds and supported the clustering of the breeds according to their history of origin. The results of this study provided useful insight regarding genetic structure of SA cattle breeds.
GeoChip 3.0: A High Throughput Tool for Analyzing Microbial Community, Composition, Structure, and Functional Activity

DOE Office of Scientific and Technical Information (OSTI.GOV)

He, Zhili; Deng, Ye; Nostrand, Joy Van

2010-05-17

Microarray-based genomic technology has been widely used for microbial community analysis, and it is expected that microarray-based genomic technologies will revolutionize the analysis of microbial community structure, function and dynamics. A new generation of functional gene arrays (GeoChip 3.0) has been developed, with 27,812 probes covering 56,990 gene variants from 292 functional gene families involved in carbon, nitrogen, phosphorus and sulfur cycles, energy metabolism, antibiotic resistance, metal resistance, and organic contaminant degradation. Those probes were derived from 2,744, 140, and 262 species for bacteria, archaea, and fungi, respectively. GeoChip 3.0 has several other distinct features, such as a common oligomore » reference standard (CORS) for data normalization and comparison, a software package for data management and future updating, and the gyrB gene for phylogenetic analysis. Our computational evaluation of probe specificity indicated that all designed probes had a high specificity to their corresponding targets. Also, experimental analysis with synthesized oligonucleotides and genomic DNAs showed that only 0.0036percent-0.025percent false positive rates were observed, suggesting that the designed probes are highly specific under the experimental conditions examined. In addition, GeoChip 3.0 was applied to analyze soil microbial communities in a multifactor grassland ecosystem in Minnesota, USA, which demonstrated that the structure, composition, and potential activity of soil microbial communities significantly changed with the plant species diversity. All results indicate that GeoChip 3.0 is a high throughput powerful tool for studying microbial community functional structure, and linking microbial communities to ecosystem processes and functioning. To our knowledge, GeoChip 3.0 is the most comprehensive microarrays currently available for studying microbial communities associated with geobiochemical cycling, global climate change, bioenergy, agricuture, land use, ecosystem management, environmental cleanup and restoration, bioreactor systems, and human health.« less
Multiscale modeling of three-dimensional genome

NASA Astrophysics Data System (ADS)

Zhang, Bin; Wolynes, Peter

The genome, the blueprint of life, contains nearly all the information needed to build and maintain an entire organism. A comprehensive understanding of the genome is of paramount interest to human health and will advance progress in many areas, including life sciences, medicine, and biotechnology. The overarching goal of my research is to understand the structure-dynamics-function relationships of the human genome. In this talk, I will be presenting our efforts in moving towards that goal, with a particular emphasis on studying the three-dimensional organization, the structure of the genome with multi-scale approaches. Specifically, I will discuss the reconstruction of genome structures at both interphase and metaphase by making use of data from chromosome conformation capture experiments. Computationally modeling of chromatin fiber at atomistic level from first principles will also be presented as our effort for studying the genome structure from bottom up.
Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome.

PubMed

Rizzon, Carène; Marais, Gabriel; Gouy, Manolo; Biémont, Christian

2002-03-01

We analyzed the distribution of 54 families of transposable elements (TEs; transposons, LTR retrotransposons, and non-LTR retrotransposons) in the chromosomes of Drosophila melanogaster, using data from the sequenced genome. The density of LTR and non-LTR retrotransposons (RNA-based elements) was high in regions with low recombination rates, but there was no clear tendency to parallel the recombination rate. However, the density of transposons (DNA-based elements) was significantly negatively correlated with recombination rate. The accumulation of TEs in regions of reduced recombination rate is compatible with selection acting against TEs, as selection is expected to be weaker in regions with lower recombination. The differences in the relationship between recombination rate and TE density that exist between chromosome arms suggest that TE distribution depends on specific characteristics of the chromosomes (chromatin structure, distribution of other sequences), the TEs themselves (transposition mechanism), and the species (reproductive system, effective population size, etc.), that have differing influences on the effect of natural selection acting against the TE insertions.
Finding similar nucleotide sequences using network BLAST searches.

PubMed

Ladunga, Istvan

2009-06-01

The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.
Animal breeding strategies can improve meat quality attributes within entire populations.

PubMed

Berry, D P; Conroy, S; Pabiou, T; Cromie, A R

2017-10-01

The contribution of animal breeding to changes in animal performance is well documented across a range of species. Once genetic variation in a trait exists, then breeding to improve the characteristics of that trait is possible, if so desired. Considerable genetic variation exists in a range of meat quality attributes across a range of species. The genetic variation that exists for meat quality is as large as observed for most performance traits; thus, within a well-structured breeding program, rapid genetic gain for meat quality could be possible. The rate of genetic gain can be augmented through the integration of DNA-based technologies into the breeding program; such DNA-based technologies should, however, be based on thousands of DNA markers dispersed across the entire genome. Genetic and genomic technologies can also have beneficial impact outside the farm gate as a tool to segregate carcasses or meat cuts based on expected meat quality features. Copyright © 2017 Elsevier Ltd. All rights reserved.
G23D: Online tool for mapping and visualization of genomic variants on 3D protein structures.

PubMed

Solomon, Oz; Kunik, Vered; Simon, Amos; Kol, Nitzan; Barel, Ortal; Lev, Atar; Amariglio, Ninette; Somech, Raz; Rechavi, Gidi; Eyal, Eran

2016-08-26

Evaluation of the possible implications of genomic variants is an increasingly important task in the current high throughput sequencing era. Structural information however is still not routinely exploited during this evaluation process. The main reasons can be attributed to the partial structural coverage of the human proteome and the lack of tools which conveniently convert genomic positions, which are the frequent output of genomic pipelines, to proteins and structure coordinates. We present G23D, a tool for conversion of human genomic coordinates to protein coordinates and protein structures. G23D allows mapping of genomic positions/variants on evolutionary related (and not only identical) protein three dimensional (3D) structures as well as on theoretical models. By doing so it significantly extends the space of variants for which structural insight is feasible. To facilitate interpretation of the variant consequence, pathogenic variants, functional sites and polymorphism sites are displayed on protein sequence and structure diagrams alongside the input variants. G23D also provides modeling of the mutant structure, analysis of intra-protein contacts and instant access to functional predictions and predictions of thermo-stability changes. G23D is available at http://www.sheba-cancer.org.il/G23D . G23D extends the fraction of variants for which structural analysis is applicable and provides better and faster accessibility for structural data to biologists and geneticists who routinely work with genomic information.
A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

USDA-ARS?s Scientific Manuscript database

The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...
Evolved Populations of Shigella flexneri Phage Sf6 Acquire Large Deletions, Altered Genomic Architecture, and Faster Life Cycles.

PubMed

Dover, John A; Burmeister, Alita R; Molineux, Ian J; Parent, Kristin N

2016-09-19

Genomic architecture is the framework within which genes and regulatory elements evolve and where specific constructs may constrain or potentiate particular adaptations. One such construct is evident in phages that use a headful packaging strategy that results in progeny phage heads packaged with DNA until full rather than encapsidating a simple unit-length genome. Here, we investigate the evolution of the headful packaging phage Sf6 in response to barriers that impede efficient phage adsorption to the host cell. Ten replicate populations evolved faster Sf6 life cycles by parallel mutations found in a phage lysis gene and/or by large, 1.2- to 4.0-kb deletions that remove a mobile genetic IS911 element present in the ancestral phage genome. The fastest life cycles were found in phages that acquired both mutations. No mutations were found in genes encoding phage structural proteins, which were a priori expected from the experimental design that imposed a challenge for phage adsorption by using a Shigella flexneri host lacking receptors preferred by Sf6. We used DNA sequencing, molecular approaches, and physiological experiments on 82 clonal isolates taken from all 10 populations to reveal the genetic basis of the faster Sf6 life cycle. The majority of our isolates acquired deletions in the phage genome. Our results suggest that deletions are adaptive and can influence the duration of the phage life cycle while acting in conjunction with other lysis time-determining point mutations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-wide increase in histone H2A ubiquitylation in a mouse model of Huntington's disease.

PubMed

McFarland, Karen N; Das, Sudeshna; Sun, Ting Ting; Leyfer, Dmitri; Kim, Mee-Ohk; Xia, Eva; Sangrey, Gavin R; Kuhn, Alexandre; Luthi-Carter, Ruth; Clark, Timothy W; Sadri-Vakili, Ghazaleh; Cha, Jang-Ho J

2013-01-01

Huntington's disease (HD) is a neurodegenerative disorder with selective vulnerability of striatal neurons and involves extensive transcriptional dysregulation early in the disease process. Previous work in cell and mouse models has shown that histone modifications are altered in HD. Specifically, monoubiquitylated histone H2A (uH2A) is present at the promoters of downregulated genes which led to the hypothesis that uH2A plays a role in transcriptional silencing in HD. To broaden our view of uH2A function in transcription in HD, we examined genome-wide binding sites of uH2A in 12-week old striatal tissue from R6/2 transgenic HD mouse model. We used chromatin immunoprecipitation followed by genomic promoter microarray hybridization (ChIP-chip) and then interrogated how these binding sites correlate with transcribed genes. Our analysis reveals that, while uH2A levels are globally increased at the genome in the transgenic (TG) striatum, uH2A localization at a gene did not strongly correlate with the absence of its transcript. Furthermore, analysis of differential ubiquitylation in wild-type (WT) and TG striata did not reveal the expected enrichment of uH2A at genes with decreased expression in the TG striatum. This first description of genome-wide localization of uH2A in an HD model reveals that monoubiquitylation of histone H2A may not function at the level of the individual gene but may rather influence transcription through global chromatin structure.

Whole-genome sequencing of Atacama skeleton shows novel mutations linked with dysplasia.

PubMed

Bhattacharya, Sanchita; Li, Jian; Sockell, Alexandra; Kan, Matthew J; Bava, Felice A; Chen, Shann-Ching; Ávila-Arcos, María C; Ji, Xuhuai; Smith, Emery; Asadi, Narges B; Lachman, Ralph S; Lam, Hugo Y K; Bustamante, Carlos D; Butte, Atul J; Nolan, Garry P

2018-04-01

Over a decade ago, the Atacama humanoid skeleton (Ata) was discovered in the Atacama region of Chile. The Ata specimen carried a strange phenotype-6-in stature, fewer than expected ribs, elongated cranium, and accelerated bone age-leading to speculation that this was a preserved nonhuman primate, human fetus harboring genetic mutations, or even an extraterrestrial. We previously reported that it was human by DNA analysis with an estimated bone age of about 6-8 yr at the time of demise. To determine the possible genetic drivers of the observed morphology, DNA from the specimen was subjected to whole-genome sequencing using the Illumina HiSeq platform with an average 11.5× coverage of 101-bp, paired-end reads. In total, 3,356,569 single nucleotide variations (SNVs) were found as compared to the human reference genome, 518,365 insertions and deletions (indels), and 1047 structural variations (SVs) were detected. Here, we present the detailed whole-genome analysis showing that Ata is a female of human origin, likely of Chilean descent, and its genome harbors mutations in genes ( COL1A1 , COL2A1 , KMT2D , FLNB , ATR , TRIP11 , PCNT ) previously linked with diseases of small stature, rib anomalies, cranial malformations, premature joint fusion, and osteochondrodysplasia (also known as skeletal dysplasia). Together, these findings provide a molecular characterization of Ata's peculiar phenotype, which likely results from multiple known and novel putative gene mutations affecting bone development and ossification. © 2018 Bhattacharya et al.; Published by Cold Spring Harbor Laboratory Press.
Pedigrees or markers: Which are better in estimating relatedness and inbreeding coefficient?

PubMed

Wang, Jinliang

2016-02-01

Individual inbreeding coefficient (F) and pairwise relatedness (r) are fundamental parameters in population genetics and have important applications in diverse fields such as human medicine, forensics, plant and animal breeding, conservation and evolutionary biology. Traditionally, both parameters are calculated from pedigrees, but are now increasingly estimated from genetic marker data. Conceptually, a pedigree gives the expected F and r values, FP and rP, with the expectations being taken (hypothetically) over an infinite number of individuals with the same pedigree. In contrast, markers give the realised (actual) F and r values at the particular marker loci of the particular individuals, FM and rM. Both pedigree (FP, rP) and marker (FM, rM) estimates can be used as inferences of genomic inbreeding coefficients FG and genomic relatedness rG, which are the underlying quantities relevant to most applications (such as estimating inbreeding depression and heritability) of F and r. In the pre-genomic era, it was widely accepted that pedigrees are much better than markers in delineating FG and rG, and markers should better be used to validate, amend and construct pedigrees rather than to replace them. Is this still true in the genomic era when genome-wide dense SNPs are available? In this simulation study, I showed that genomic markers can yield much better estimates of FG and rG than pedigrees when they are numerous (say, 10(4) SNPs) under realistic situations (e.g. genome and population sizes). Pedigree estimates are especially poor for species with a small genome, where FG and rG are determined to a large extent by Mendelian segregations and may thus deviate substantially from their expectations (FP and rP). Simulations also confirmed that FM, when estimated from many SNPs, can be much more powerful than FP for detecting inbreeding depression in viability. However, I argue that pedigrees cannot be replaced completely by genomic SNPs, because the former allows for the calculation of more complicated IBD coefficients (involving more than 2 individuals, more than one locus, and more than 2 genes at a locus) for which the latter may have reduced capacity or limited power, and because the former has social and other significance for remote relationships which have little genetic significance and cannot be inferred reliably from markers. Copyright © 2015 Elsevier Inc. All rights reserved.
Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

PubMed

Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

2008-06-23

The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.
A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data.

PubMed

Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

2015-09-18

Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data

PubMed Central

Butyaev, Alexander; Mavlyutov, Ruslan; Blanchette, Mathieu; Cudré-Mauroux, Philippe; Waldispühl, Jérôme

2015-01-01

Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/. PMID:25990738
Bridging the Resolution Gap in Structural Modeling of 3D Genome Organization

PubMed Central

Marti-Renom, Marc A.; Mirny, Leonid A.

2011-01-01

Over the last decade, and especially after the advent of fluorescent in situ hybridization imaging and chromosome conformation capture methods, the availability of experimental data on genome three-dimensional organization has dramatically increased. We now have access to unprecedented details of how genomes organize within the interphase nucleus. Development of new computational approaches to leverage this data has already resulted in the first three-dimensional structures of genomic domains and genomes. Such approaches expand our knowledge of the chromatin folding principles, which has been classically studied using polymer physics and molecular simulations. Our outlook describes computational approaches for integrating experimental data with polymer physics, thereby bridging the resolution gap for structural determination of genomes and genomic domains. PMID:21779160
Scanning the human genome at kilobase resolution.

PubMed

Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

2008-05-01

Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.
Complete Genome Sequence of the Alfalfa Symbiont Sinorhizobium/Ensifer meliloti Strain GR4

PubMed Central

Martínez-Abarca, Francisco; Martínez-Rodríguez, Laura; López-Contreras, José Antonio; Jiménez-Zurdo, José Ignacio

2013-01-01

We present the complete nucleotide sequence of the multipartite genome of Sinorhizobium/Ensifer meliloti GR4, a predominant rhizobial strain in an agricultural field site. The genome (total size, 7.14 Mb) consists of five replicons: one chromosome, two expected symbiotic megaplasmids (pRmeGR4c and pRmeGR4d), and two accessory plasmids (pRmeGR4a and pRmeGR4b). PMID:23409262
Retention of the Native Epigenome in Purified Mammalian Chromatin

PubMed Central

Ehrensberger, Andreas H.; Franchini, Don-Marc; East, Philip; George, Roger; Matthews, Nik; Maslen, Sarah L.; Svejstrup, Jesper Q.

2015-01-01

A protocol is presented for the isolation of native mammalian chromatin as fibers of 25–250 nucleosomes under conditions that preserve the natural epigenetic signature. The material is composed almost exclusively of histones and DNA and conforms to the structure expected by electron microscopy. All sequences probed for were retained, indicating that the material is representative of the majority of the genome. DNA methylation marks and histone marks resembled the patterns observed in vivo. Importantly, nucleosome positions also remained largely unchanged, except on CpG islands, where nucleosomes were found to be unstable. The technical challenges of reconstituting biochemical reactions with native mammalian chromatin are discussed. PMID:26248330
Complete mitochondrial genome sequence of black mustard (Brassica nigra; BB) and comparison with Brassica oleracea (CC) and Brassica carinata (BBCC).

PubMed

Yamagishi, Hiroshi; Tanaka, Yoshiyuki; Terachi, Toru

2014-11-01

Crop species of Brassica (Brassicaceae) consist of three monogenomic species and three amphidiploid species resulting from interspecific hybridizations among them. Until now, mitochondrial genome sequences were available for only five of these species. We sequenced the mitochondrial genome of the sixth species, Brassica nigra (nuclear genome constitution BB), and compared it with those of Brassica oleracea (CC) and Brassica carinata (BBCC). The genome was assembled into a 232 145 bp circular sequence that is slightly larger than that of B. oleracea (219 952 bp). The genome of B. nigra contained 33 protein-coding genes, 3 rRNA genes, and 17 tRNA genes. The cox2-2 gene present in B. oleracea was absent in B. nigra. Although the nucleotide sequences of 52 genes were identical between B. nigra and B. carinata, the second exon of rps3 showed differences including an insertion/deletion (indel) and nucleotide substitutions. A PCR test to detect the indel revealed intraspecific variation in rps3, and in one line of B. nigra it amplified a DNA fragment of the size expected for B. carinata. In addition, the B. carinata lines tested here produced DNA fragments of the size expected for B. nigra. The results indicate that at least two mitotypes of B. nigra were present in the maternal parents of B. carinata.
GenomeD3Plot: a library for rich, interactive visualizations of genomic data in web applications.

PubMed

Laird, Matthew R; Langille, Morgan G I; Brinkman, Fiona S L

2015-10-15

A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations. GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general. GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/. brinkman@sfu.ca. © The Author 2015. Published by Oxford University Press.
Transcription arrest by a G quadruplex forming-trinucleotide repeat sequence from the human c-myb gene.

PubMed

Broxson, Christopher; Beckett, Joshua; Tornaletti, Silvia

2011-05-17

Non canonical DNA structures correspond to genomic regions particularly susceptible to genetic instability. The transcription process facilitates formation of these structures and plays a major role in generating the instability associated with these genomic sites. However, little is known about how non canonical structures are processed when encountered by an elongating RNA polymerase. Here we have studied the behavior of T7 RNA polymerase (T7RNAP) when encountering a G quadruplex forming-(GGA)(4) repeat located in the human c-myb proto-oncogene. To make direct correlations between formation of the structure and effects on transcription, we have taken advantage of the ability of the T7 polymerase to transcribe single-stranded substrates and of G4 DNA to form in single-stranded G-rich sequences in the presence of potassium ions. Under physiological KCl concentrations, we found that T7 RNAP transcription was arrested at two sites that mapped to the c-myb (GGA)(4) repeat sequence. The extent of arrest did not change with time, indicating that the c-myb repeat represented an absolute block and not a transient pause to T7 RNAP. Consistent with G4 DNA formation, arrest was not observed in the absence of KCl or in the presence of LiCl. Furthermore, mutations in the c-myb (GGA)(4) repeat, expected to prevent transition to G4, also eliminated the transcription block. We show T7 RNAP arrest at the c-myb repeat in double-stranded DNA under conditions mimicking the cellular concentration of biomolecules and potassium ions, suggesting that the G4 structure formed in the c-myb repeat may represent a transcription roadblock in vivo. Our results support a mechanism of transcription-coupled DNA repair initiated by arrest of transcription at G4 structures.
Self-organisation of an oligodeoxynucleotide containing the G- and C-rich stretches of the direct repeats of the human mitochondrial DNA.

PubMed

Nonin-Lecomte, Sylvie; Dardel, Frédéric; Lestienne, Patrick

2005-08-01

Stretches of cytosines and guanosines have been shown in vitro to adopt non-canonical structures known as i-motifs and G-quartets, respectively. When combined, such sequences are expected to either retain their structure or form duplexes or triple helices. All these structures may occur in vivo whenever the sequence criteria are met. Such stretches are present in the circular genome of human mitochondria, as two 10 nucleotide-long perfect tandem direct repeats (DR1 and DR2). The DR1 and DR2 repeats are G-rich on the heavy strand and C-rich on the light strand. Previous results suggested that during replication, transient formation of a parallel GGC triple helix between the neo-synthesised G-rich DR1 and the double-stranded homologous DR2 could be involved in a rearrangement process leading to genome instability. In order to get structural insights into the interaction between the two repeats, we have studied by nuclear magnetic resonance (NMR) the assembly properties of a 24-mer oligodeoxyribonucleotide in which the C- and G-rich segments of the DRs are covalently tethered by a TTTT linker. We show here that this 24-mer self-associates into a triplex-containing symmetrical tetramer. The core of the structure is composed of anti-parallel Watson-Crick (WC) base pairs. Two additional strands are hydrogen-bonded to the Hoogsteen side of the Gs, thus forming CGC(+) triple helices, with G-rich ends folding into G-quartets. These results suggest that such structures could occur when the two DRs are put to close proximity in a biological context.
Visualization of RNA structure models within the Integrative Genomics Viewer.

PubMed

Busan, Steven; Weeks, Kevin M

2017-07-01

Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Studies on cattle genomic structural variation provide insights into ruminant speciation and adaptation

USDA-ARS?s Scientific Manuscript database

Genomic structural variations, including segmental duplications (SD) and copy number variations (CNV), contribute significantly to individual health and disease in primates and rodents. As a part of the bovine genome annotation effort, we performed the first genome-wide analysis of SD in cattle usin...
Fast neutron mutants database and web displays at SoyBase

USDA-ARS?s Scientific Manuscript database

SoyBase, the USDA-ARS soybean genetics and genomics database, has been expanded to include data for the fast neutron mutants produced by Bolon, Vance, et al. In addition to the expected text and sequence homology searches and visualization of the indels in the context of the genome sequence viewer, ...
Genome-derived vaccines.

PubMed

De Groot, Anne S; Rappuoli, Rino

2004-02-01

Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
The structure and evolution of angiosperm nuclear genomes.

PubMed

Bennetzen, J L

1998-04-01

Despite several decades of investigation, the organization of angiosperm genomes remained largely unknown until very recently. Data describing the sequence composition of large segments of genomes, covering hundreds of kilobases of contiguous sequence, have only become available in the past two years. Recent results indicate commonalities in the characteristics of many plant genomes, including in the structure of chromosomal components like telomeres and centromeres, and in the order and content of genes. Major differences between angiosperms have been associated mainly with repetitive DNAs, both gene families and mobile elements. Intriguing new studies have begun to characterize the dynamic three-dimensional structures of chromosomes and chromatin, and the relationship between genome structure and co-ordinated gene function.
Reconstitution of wild type viral DNA in simian cells transfected with early and late SV40 defective genomes.

PubMed

O'Neill, F J; Gao, Y; Xu, X

1993-11-01

The DNAs of polyomaviruses ordinarily exist as a single circular molecule of approximately 5000 base pairs. Variants of SV40, BKV and JCV have been described which contain two complementing defective DNA molecules. These defectives, which form a bipartite genome structure, contain either the viral early region or the late region. The defectives have the unique property of being able to tolerate variable sized reiterations of regulatory and terminus region sequences, and portions of the coding region. They can also exchange coding region sequences with other polyomaviruses. It has been suggested that the bipartite genome structure might be a stage in the evolution of polyomaviruses which can uniquely sustain genome and sequence diversity. However, it is not known if the regulatory and terminus region sequences are highly mutable. Also, it is not known if the bipartite genome structure is reversible and what the conditions might be which would favor restoration of the monomolecular genome structure. We addressed the first question by sequencing the reiterated regulatory and terminus regions of E- and L-SV40 DNAs. This revealed a large number of mutations in the regulatory regions of the defective genomes, including deletions, insertions, rearrangements and base substitutions. We also detected insertions and base substitutions in the T-antigen gene. We addressed the second question by introducing into permissive simian cells, E- and L-SV40 genomes which had been engineered to contain only a single regulatory region. Analysis of viral DNA from transfected cells demonstrated recombined genomes containing a wild type monomolecular DNA structure. However, the complete defectives, containing reiterated regulatory regions, could often compete away the wild type genomes. The recombinant monomolecular genomes were isolated, cloned and found to be infectious. All of the DNA alterations identified in one of the regulatory regions of E-SV40 DNA were present in the recombinant monomolecular genomes. These and other findings indicate that the bipartite genome state can sustain many mutations which wtSV40 cannot directly sustain. However, the mutations can later be introduced into the wild type genomes when the E- and L-SV40 DNAs recombine to generate a new monomolecular genome structure.
RNA structural constraints in the evolution of the influenza A virus genome NP segment

PubMed Central

Gultyaev, Alexander P; Tsyganov-Bodounov, Anton; Spronken, Monique IJ; van der Kooij, Sander; Fouchier, Ron AM; Olsthoorn, René CL

2014-01-01

Conserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length, including protein-coding regions. Calculations of mutual information values at the paired nucleotide positions demonstrate that these structures impose considerable constraints on the virus genome evolution. Functional importance of a pseudoknot structure, predicted in the NP packaging signal region, was confirmed by plaque assays of the mutant viruses with disrupted structure and those with restored folding using compensatory substitutions. Possible functions of the conserved RNA folding patterns in the influenza A virus genome are discussed. PMID:25180940

Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

PubMed

Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

2007-02-15

Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Is Whole-Exome Sequencing an Ethically Disruptive Technology? Perspectives of Pediatric Oncologists and Parents of Pediatric Patients With Solid Tumors.

PubMed

McCullough, Laurence B; Slashinski, Melody J; McGuire, Amy L; Street, Richard L; Eng, Christine M; Gibbs, Richard A; Parsons, D William; Plon, Sharon E

2016-03-01

It has been anticipated that physician and parents will be ill prepared or unprepared for the clinical introduction of genome sequencing, making it ethically disruptive. As a part of the Baylor Advancing Sequencing in Childhood Cancer Care study, we conducted semistructured interviews with 16 pediatric oncologists and 40 parents of pediatric patients with cancer prior to the return of sequencing results. We elicited expectations and attitudes concerning the impact of sequencing on clinical decision making, clinical utility, and treatment expectations from both groups. Using accepted methods of qualitative research to analyze interview transcripts, we completed a thematic analysis to provide inductive insights into their views of sequencing. Our major findings reveal that neither pediatric oncologists nor parents anticipate sequencing to be an ethically disruptive technology, because they expect to be prepared to integrate sequencing results into their existing approaches to learning and using new clinical information for care. Pediatric oncologists do not expect sequencing results to be more complex than other diagnostic information and plan simply to incorporate these data into their evidence-based approach to clinical practice, although they were concerned about impact on parents. For parents, there is an urgency to protect their child's health and in this context they expect genomic information to better prepare them to participate in decisions about their child's care. Our data do not support the concern that introducing genome sequencing into childhood cancer care will be ethically disruptive, that is, leave physicians or parents ill prepared or unprepared to make responsible decisions about patient care. © 2015 Wiley Periodicals, Inc.
Energy Landscapes of Folding Chromosomes

NASA Astrophysics Data System (ADS)

Zhang, Bin

The genome, the blueprint of life, contains nearly all the information needed to build and maintain an entire organism. A comprehensive understanding of the genome is of paramount interest to human health and will advance progress in many areas, including life sciences, medicine, and biotechnology. The overarching goal of my research is to understand the structure-dynamics-function relationships of the human genome. In this talk, I will be presenting our efforts in moving towards that goal, with a particular emphasis on studying the three-dimensional organization, the structure of the genome with multi-scale approaches. Specifically, I will discuss the reconstruction of genome structures at both interphase and metaphase by making use of data from chromosome conformation capture experiments. Computationally modeling of chromatin fiber at atomistic level from first principles will also be presented as our effort for studying the genome structure from bottom up.
Assessing the expected response to genomic selection of individuals and families in Eucalyptus breeding with an additive-dominant model.

PubMed

Resende, R T; Resende, M D V; Silva, F F; Azevedo, C F; Takahashi, E K; Silva-Junior, O B; Grattapaglia, D

2017-10-01

We report a genomic selection (GS) study of growth and wood quality traits in an outbred F 2 hybrid Eucalyptus population (n=768) using high-density single-nucleotide polymorphism (SNP) genotyping. Going beyond previous reports in forest trees, models were developed for different selection targets, namely, families, individuals within families and individuals across the entire population using a genomic model including dominance. To provide a more breeder-intelligible assessment of the performance of GS we calculated the expected response as the percentage gain over the population average expected genetic value (EGV) for different proportions of genomically selected individuals, using a rigorous cross-validation (CV) scheme that removed relatedness between training and validation sets. Predictive abilities (PAs) were 0.40-0.57 for individual selection and 0.56-0.75 for family selection. PAs under an additive+dominance model improved predictions by 5 to 14% for growth depending on the selection target, but no improvement was seen for wood traits. The good performance of GS with no relatedness in CV suggested that our average SNP density (~25 kb) captured some short-range linkage disequilibrium. Truncation GS successfully selected individuals with an average EGV significantly higher than the population average. Response to GS on a per year basis was ~100% more efficient than by phenotypic selection and more so with higher selection intensities. These results contribute further experimental data supporting the positive prospects of GS in forest trees. Because generation times are long, traits are complex and costs of DNA genotyping are plummeting, genomic prediction has good perspectives of adoption in tree breeding practice.
Significant Natural Product Biosynthetic Potential of Actinorhizal Symbionts of the Genus Frankia, as Revealed by Comparative Genomic and Proteomic Analyses▿

PubMed Central

Udwary, Daniel W.; Gontang, Erin A.; Jones, Adam C.; Jones, Carla S.; Schultz, Andrew W.; Winter, Jaclyn M.; Yang, Jane Y.; Beauchemin, Nicholas; Capson, Todd L.; Clark, Benjamin R.; Esquenazi, Eduardo; Eustáquio, Alessandra S.; Freel, Kelle; Gerwick, Lena; Gerwick, William H.; Gonzalez, David; Liu, Wei-Ting; Malloy, Karla L.; Maloney, Katherine N.; Nett, Markus; Nunnery, Joshawna K.; Penn, Kevin; Prieto-Davo, Alejandra; Simmons, Thomas L.; Weitz, Sara; Wilson, Micheal C.; Tisa, Louis S.; Dorrestein, Pieter C.; Moore, Bradley S.

2011-01-01

Bacteria of the genus Frankia are mycelium-forming actinomycetes that are found as nitrogen-fixing facultative symbionts of actinorhizal plants. Although soil-dwelling actinomycetes are well-known producers of bioactive compounds, the genus Frankia has largely gone uninvestigated for this potential. Bioinformatic analysis of the genome sequences of Frankia strains ACN14a, CcI3, and EAN1pec revealed an unexpected number of secondary metabolic biosynthesis gene clusters. Our analysis led to the identification of at least 65 biosynthetic gene clusters, the vast majority of which appear to be unique and for which products have not been observed or characterized. More than 25 secondary metabolite structures or structure fragments were predicted, and these are expected to include cyclic peptides, siderophores, pigments, signaling molecules, and specialized lipids. Outside the hopanoid gene locus, no cluster could be convincingly demonstrated to be responsible for the few secondary metabolites previously isolated from other Frankia strains. Few clusters were shared among the three species, demonstrating species-specific biosynthetic diversity. Proteomic analysis of Frankia sp. strains CcI3 and EAN1pec showed that significant and diverse secondary metabolic activity was expressed in laboratory cultures. In addition, several prominent signals in the mass range of peptide natural products were observed in Frankia sp. CcI3 by intact-cell matrix-assisted laser desorption-ionization mass spectrometry (MALDI-MS). This work supports the value of bioinformatic investigation in natural products biosynthesis using genomic information and presents a clear roadmap for natural products discovery in the Frankia genus. PMID:21498757
Genomic organization of the 260 kb surrounding the waxy locus in a Japonica rice

PubMed

Nagano; Wu; Kawasaki; Kishima; Sano

1999-12-01

The present study was carried out to characterize the molecular organization in the vicinity of the waxy locus in rice. To determine the structural organization of the region surrounding waxy, contiguous clones covering a total of 260 kb were constructed using a bacterial artificial chromosome (BAC) library from the Shimokita variety of Japonica rice. This map also contains 200 overlapping subclones, which allowed construction of a fine physical map with a total of 64 HindIII sites. During the course of constructing the map, we noticed the presence of some repeated regions which might be related to transposable elements. We divided the 260-kb region into 60 segments (average size of 5.7 kb) to use as probes to determine their genomic organization. Hybridization patterns obtained by probing with these segments were classified into four types: class 1, a single or a few bands without a smeared background; class 2, a single or a few bands with a smeared background; class 3, multiple discrete bands without a smeared background; and class 4, only a smeared background. These classes constituted 6.5%, 20.9%, 3.7%, and 68.9% of the 260-kb region, respectively. The distribution of each class revealed that repetitive sequences are a major component in this region, as expected, and that unique sequence regions were mostly no longer than 6 kb due to interruption by repetitive sequences. We discuss how the map constructed here might be a powerful tool for characterization and comparison of the genome structures and the genes around the waxy locus in the Oryza species.
Draft sequencing and assembly of the genome of the world's largest fish, the whale shark: Rhincodon typus Smith 1828.

PubMed

Read, Timothy D; Petit, Robert A; Joseph, Sandeep J; Alam, Md Tauqeer; Weil, M Ryan; Ahmad, Maida; Bhimani, Ravila; Vuong, Jocelyn S; Haase, Chad P; Webb, D Harry; Tan, Milton; Dove, Alistair D M

2017-07-14

The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species. Therefore, it is also the largest extant species of the paraphyletic assemblage commonly referred to as fishes. As both a phenotypic extreme and a member of the group Chondrichthyes - the sister group to the remaining gnathostomes, which includes all tetrapods and therefore also humans - its genome is of substantial comparative interest. Whale sharks are also listed as an endangered species on the International Union for Conservation of Nature's Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which yielded a draft assembly of 1,213,200 contigs and 997,976 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the holocephalan elephant shark. The whale shark contained a novel Toll-like-receptor (TLR) protein with sequence similarity to both the TLR4 and TLR13 proteins of mammals and TLR21 of teleosts. The data are publicly available on GenBank, FigShare, and from the NCBI Short Read Archive under accession number SRP044374. This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.
Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

PubMed

Seal, B S; Neill, J D; Ridpath, J F

1994-07-01

Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.
Linear Lepidopteran ambidensovirus 1 sequences drive random integration of a reporter gene in transfected Spodoptera frugiperda cells.

PubMed

Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry

2018-01-01

The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
Codon Usage Bias and Determining Forces in Taenia solium Genome.

PubMed

Yang, Xing; Ma, Xusheng; Luo, Xuenong; Ling, Houjun; Zhang, Xichen; Cai, Xuepeng

2015-12-01

The tapeworm Taenia solium is an important human zoonotic parasite that causes great economic loss and also endangers public health. At present, an effective vaccine that will prevent infection and chemotherapy without any side effect remains to be developed. In this study, codon usage patterns in the T. solium genome were examined through 8,484 protein-coding genes. Neutrality analysis showed that T. solium had a narrow GC distribution, and a significant correlation was observed between GC12 and GC3. Examination of an NC (ENC vs GC3s)-plot showed a few genes on or close to the expected curve, but the majority of points with low-ENC (the effective number of codons) values were detected below the expected curve, suggesting that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally. We also identified 26 optimal codons in the T. solium genome, all of which ended with either a G or C residue. These optimal codons in the T. solium genome are likely consistent with tRNAs that are highly expressed in the cell, suggesting that mutational and translational selection forces are probably driving factors of codon usage bias in the T. solium genome.
After the Genome IV: Envisioning Biology in the Year 2010

NASA Technical Reports Server (NTRS)

Brent, Roger

1999-01-01

The After the Genome meetings were started in 1995 to help the biological community think about and prepare for the changes in biological research in the face of genomic information. This workshop brings together intellectuals from subject fields far outside of conventional biology with the expectation that this will help focus thinking beyond the immediate future. Hence the subtitle for this year's meeting: "Envisioning Biology in the Year 2010". Accordingly, the organizers brought together a broadly multi-disciplinary group of thinkers and working scientists.
New Implications on Genomic Adaptation Derived from the Helicobacter pylori Genome Comparison

PubMed Central

Lara-Ramírez, Edgar Eduardo; Segura-Cabrera, Aldo; Guo, Xianwu; Yu, Gongxin; García-Pérez, Carlos Armando; Rodríguez-Pérez, Mario A.

2011-01-01

Background Helicobacter pylori has a reduced genome and lives in a tough environment for long-term persistence. It evolved with its particular characteristics for biological adaptation. Because several H. pylori genome sequences are available, comparative analysis could help to better understand genomic adaptation of this particular bacterium. Principal Findings We analyzed nine H. pylori genomes with emphasis on microevolution from a different perspective. Inversion was an important factor to shape the genome structure. Illegitimate recombination not only led to genomic inversion but also inverted fragment duplication, both of which contributed to the creation of new genes and gene family, and further, homological recombination contributed to events of inversion. Based on the information of genomic rearrangement, the first genome scaffold structure of H. pylori last common ancestor was produced. The core genome consists of 1186 genes, of which 22 genes could particularly adapt to human stomach niche. H. pylori contains high proportion of pseudogenes whose genesis was principally caused by homopolynucleotide (HPN) mutations. Such mutations are reversible and facilitate the control of gene expression through the change of DNA structure. The reversible mutations and a quasi-panmictic feature could allow such genes or gene fragments frequently transferred within or between populations. Hence, pseudogenes could be a reservoir of adaptation materials and the HPN mutations could be favorable to H. pylori adaptation, leading to HPN accumulation on the genomes, which corresponds to a special feature of Helicobacter species: extremely high HPN composition of genome. Conclusion Our research demonstrated that both genome content and structure of H. pylori have been highly adapted to its particular life style. PMID:21387011
Mms1 is an assistant for regulating G-quadruplex DNA structures.

PubMed

Schwindt, Eike; Paeschke, Katrin

2018-06-01

The preservation of genome stability is fundamental for every cell. Genomic integrity is constantly challenged. Among those challenges are also non-canonical nucleic acid structures. In recent years, scientists became aware of the impact of G-quadruplex (G4) structures on genome stability. It has been shown that folded G4-DNA structures cause changes in the cell, such as transcriptional up/down-regulation, replication stalling, or enhanced genome instability. Multiple helicases have been identified to regulate G4 structures and by this preserve genome stability. Interestingly, although these helicases are mostly ubiquitous expressed, they show specificity for G4 regulation in certain cellular processes (e.g., DNA replication). To this date, it is not clear how this process and target specificity of helicases are achieved. Recently, Mms1, an ubiquitin ligase complex protein, was identified as a novel G4-DNA-binding protein that supports genome stability by aiding Pif1 helicase binding to these regions. In this perspective review, we discuss the question if G4-DNA interacting proteins are fundamental for helicase function and specificity at G4-DNA structures.
Using Single-Nucleotide Polymorphisms To Discriminate Disease-Associated from Carried Genomes of Neisseria meningitidis▿†

PubMed Central

Katz, Lee S.; Sharma, Nitya V.; Harcourt, Brian H.; Thomas, Jennifer Dolan; Wang, Xin; Mayer, Leonard W.; Jordan, I. King

2011-01-01

Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category “symbiosis, encompassing mutualism through parasitism.” PMID:21622743
Multi-scale structural community organisation of the human genome.

PubMed

Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

2017-04-11

Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.
Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.

PubMed

Campbell, Matthew A; Van Leuven, James T; Meister, Russell C; Carey, Kaitlin M; Simon, Chris; McCutcheon, John P

2015-08-18

Comparative genomics from mitochondria, plastids, and mutualistic endosymbiotic bacteria has shown that the stable establishment of a bacterium in a host cell results in genome reduction. Although many highly reduced genomes from endosymbiotic bacteria are stable in gene content and genome structure, organelle genomes are sometimes characterized by dramatic structural diversity. Previous results from Candidatus Hodgkinia cicadicola, an endosymbiont of cicadas, revealed that some lineages of this bacterium had split into two new cytologically distinct yet genetically interdependent species. It was hypothesized that the long life cycle of cicadas in part enabled this unusual lineage-splitting event. Here we test this hypothesis by investigating the structure of the Ca. Hodgkinia genome in one of the longest-lived cicadas, Magicicada tredecim. We show that the Ca. Hodgkinia genome from M. tredecim has fragmented into multiple new chromosomes or genomes, with at least some remaining partitioned into discrete cells. We also show that this lineage-splitting process has resulted in a complex of Ca. Hodgkinia genomes that are 1.1-Mb pairs in length when considered together, an almost 10-fold increase in size from the hypothetical single-genome ancestor. These results parallel some examples of genome fragmentation and expansion in organelles, although the mechanisms that give rise to these extreme genome instabilities are likely different.
A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities.

PubMed

Mbogning, Cyprien; Perdry, Hervé; Toussile, Wilson; Broët, Philippe

2014-01-01

Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected. We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy. The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns. The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.
Uneven distribution of expressed sequence tag loci on maize pachytene chromosomes

PubMed Central

Anderson, Lorinda K.; Lai, Ann; Stack, Stephen M.; Rizzon, Carene; Gaut, Brandon S.

2006-01-01

Examining the relationships among DNA sequence, meiotic recombination, and chromosome structure at a genome-wide scale has been difficult because only a few markers connect genetic linkage maps with physical maps. Here, we have positioned 1195 genetically mapped expressed sequence tag (EST) markers onto the 10 pachytene chromosomes of maize by using a newly developed resource, the RN-cM map. The RN-cM map charts the distribution of crossing over in the form of recombination nodules (RNs) along synaptonemal complexes (SCs, pachytene chromosomes) and allows genetic cM distances to be converted into physical micrometer distances on chromosomes. When this conversion is made, most of the EST markers used in the study are located distally on the chromosomes in euchromatin. ESTs are significantly clustered on chromosomes, even when only euchromatic chromosomal segments are considered. Gene density and recombination rate (as measured by EST and RN frequencies, respectively) are strongly correlated. However, crossover frequencies for telomeric intervals are much higher than was expected from their EST frequencies. For pachytene chromosomes, EST density is about fourfold higher in euchromatin compared with heterochromatin, while DNA density is 1.4 times higher in heterochromatin than in euchromatin. Based on DNA density values and the fraction of pachytene chromosome length that is euchromatic, we estimate that ∼1500 Mbp of the maize genome is in euchromatin. This overview of the organization of the maize genome will be useful in examining genome and chromosome evolution in plants. PMID:16339046
Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments.

PubMed

Windhausen, Vanessa S; Atlin, Gary N; Hickey, John M; Crossa, Jose; Jannink, Jean-Luc; Sorrells, Mark E; Raman, Babu; Cairns, Jill E; Tarekegne, Amsal; Semagn, Kassa; Beyene, Yoseph; Grudloyma, Pichet; Technow, Frank; Riedelsheimer, Christian; Melchinger, Albrecht E

2012-11-01

Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F(2)-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F(2)-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set.
Successful development of microsatellite markers in a challenging species: the horizontal borer Austroplatypus incompertus (Coleoptera: Curculionidae).

PubMed

Smith, S; Joss, T; Stow, A

2011-10-01

The analysis of microsatellite loci has allowed significant advances in evolutionary biology and pest management. However, until very recently, the potential benefits have been compromised by the high costs of developing these neutral markers. High-throughput sequencing provides a solution to this problem. We describe the development of 13 microsatellite markers for the eusocial ambrosia beetle, Austroplatypus incompertus, a significant pest of forests in southeast Australia. The frequency of microsatellite repeats in the genome of A. incompertus was determined to be low, and previous attempts at microsatellite isolation using a traditional genomic library were problematic. Here, we utilised two protocols, microsatellite-enriched genomic library construction and high-throughput 454 sequencing and characterised 13 loci which were polymorphic among 32 individuals. Numbers of alleles per locus ranged from 2 to 17, and observed and expected heterozygosities from 0.344 to 0.767 and from 0.507 to 0.860, respectively. These microsatellites have the resolution required to analyse fine-scale colony and population genetic structure. Our work demonstrates the utility of next-generation 454 sequencing as a method for rapid and cost-effective acquisition of microsatellites where other techniques have failed, or for taxa where marker development has historically been both complicated and expensive.

Impacts of Population Structure and Analytical Models in Genome-Wide Association Studies of Complex Traits in Forest Trees: A Case Study in Eucalyptus globulus

PubMed Central

Garcia, Martín N.; Acuña, Cintia; Borralho, Nuno M. G.; Grattapaglia, Dario; Marcucci Poltri, Susana N.

2013-01-01

The promise of association genetics to identify genes or genomic regions controlling complex traits has generated a flurry of interest. Such phenotype-genotype associations could be useful to accelerate tree breeding cycles, increase precision and selection intensity for late expressing, low heritability traits. However, the prospects of association genetics in highly heterozygous undomesticated forest trees can be severely impacted by the presence of cryptic population and pedigree structure. To investigate how to better account for this, we compared the GLM and five combinations of the Unified Mixed Model (UMM) on data of a low-density genome-wide association study for growth and wood property traits carried out in a Eucalyptus globulus population (n = 303) with 7,680 Diversity Array Technology (DArT) markers. Model comparisons were based on the degree of deviation from the uniform distribution and estimates of the mean square differences between the observed and expected p-values of all significant marker-trait associations detected. Our analysis revealed the presence of population and family structure. There was not a single best model for all traits. Striking differences in detection power and accuracy were observed among the different models especially when population structure was not accounted for. The UMM method was the best and produced superior results when compared to GLM for all traits. Following stringent correction for false discoveries, 18 marker-trait associations were detected, 16 for tree diameter growth and two for lignin monomer composition (S∶G ratio), a key wood property trait. The two DArT markers associated with S∶G ratio on chromosome 10, physically map within 1 Mbp of the ferulate 5-hydroxylase (F5H) gene, providing a putative independent validation of this marker-trait association. This study details the merit of collectively integrate population structure and relatedness in association analyses in undomesticated, highly heterozygous forest trees, and provides additional insights into the nature of complex quantitative traits in Eucalyptus. PMID:24282578
Widespread of horizontal gene transfer in the human genome.

PubMed

Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

2017-04-04

A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome

USDA-ARS?s Scientific Manuscript database

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...
Len Gen: The international lentil genome sequencing project

USDA-ARS?s Scientific Manuscript database

We have been sequencing CDC Redberry using NGS of paired-end and mate-pair libraries over a wide range of sizes and technologies. The most recent draft (v0.7) of approximately 150x coverage produced scaffolds covering over half the genome (2.7 Gb of the expected 4.3 Gb). Long reads from PacBio sequ...
The COG database: an updated version includes eukaryotes

PubMed Central

Tatusov, Roman L; Fedorova, Natalie D; Jackson, John D; Jacobs, Aviva R; Kiryutin, Boris; Koonin, Eugene V; Krylov, Dmitri M; Mazumder, Raja; Mekhedov, Sergei L; Nikolskaya, Anastasia N; Rao, B Sridhar; Smirnov, Sergei; Sverdlov, Alexander V; Vasudevan, Sona; Wolf, Yuri I; Yin, Jodie J; Natale, Darren A

2003-01-01

Background The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. Results We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. Conclusion The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. PMID:12969510
Trends in genome-wide and region-specific genetic diversity in the Dutch-Flemish Holstein-Friesian breeding program from 1986 to 2015.

PubMed

Doekes, Harmen P; Veerkamp, Roel F; Bijma, Piter; Hiemstra, Sipke J; Windig, Jack J

2018-04-11

In recent decades, Holstein-Friesian (HF) selection schemes have undergone profound changes, including the introduction of optimal contribution selection (OCS; around 2000), a major shift in breeding goal composition (around 2000) and the implementation of genomic selection (GS; around 2010). These changes are expected to have influenced genetic diversity trends. Our aim was to evaluate genome-wide and region-specific diversity in HF artificial insemination (AI) bulls in the Dutch-Flemish breeding program from 1986 to 2015. Pedigree and genotype data (~ 75.5 k) of 6280 AI-bulls were used to estimate rates of genome-wide inbreeding and kinship and corresponding effective population sizes. Region-specific inbreeding trends were evaluated using regions of homozygosity (ROH). Changes in observed allele frequencies were compared to those expected under pure drift to identify putative regions under selection. We also investigated the direction of changes in allele frequency over time. Effective population size estimates for the 1986-2015 period ranged from 69 to 102. Two major breakpoints were observed in genome-wide inbreeding and kinship trends. Around 2000, inbreeding and kinship levels temporarily dropped. From 2010 onwards, they steeply increased, with pedigree-based, ROH-based and marker-based inbreeding rates as high as 1.8, 2.1 and 2.8% per generation, respectively. Accumulation of inbreeding varied substantially across the genome. A considerable fraction of markers showed changes in allele frequency that were greater than expected under pure drift. Putative selected regions harboured many quantitative trait loci (QTL) associated to a wide range of traits. In consecutive 5-year periods, allele frequencies changed more often in the same direction than in opposite directions, except when comparing the 1996-2000 and 2001-2005 periods. Genome-wide and region-specific diversity trends reflect major changes in the Dutch-Flemish HF breeding program. Introduction of OCS and the shift in breeding goal were followed by a drop in inbreeding and kinship and a shift in the direction of changes in allele frequency. After introduction of GS, rates of inbreeding and kinship increased substantially while allele frequencies continued to change in the same direction as before GS. These results provide insight in the effect of breeding practices on genomic diversity and emphasize the need for efficient management of genetic diversity in GS schemes.
Life in the fast lane for protein crystallization and X-ray crystallography

NASA Technical Reports Server (NTRS)

Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

2005-01-01

The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography

NASA Technical Reports Server (NTRS)

Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.

2004-01-01

The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
A systems approach defining constraints of the genome architecture on lineage selection and evolvability during somatic cancer evolution

PubMed Central

Rübben, Albert; Nordhoff, Ole

2013-01-01

Summary Most clinically distinguishable malignant tumors are characterized by specific mutations, specific patterns of chromosomal rearrangements and a predominant mechanism of genetic instability but it remains unsolved whether modifications of cancer genomes can be explained solely by mutations and selection through the cancer microenvironment. It has been suggested that internal dynamics of genomic modifications as opposed to the external evolutionary forces have a significant and complex impact on Darwinian species evolution. A similar situation can be expected for somatic cancer evolution as molecular key mechanisms encountered in species evolution also constitute prevalent mutation mechanisms in human cancers. This assumption is developed into a systems approach of carcinogenesis which focuses on possible inner constraints of the genome architecture on lineage selection during somatic cancer evolution. The proposed systems approach can be considered an analogy to the concept of evolvability in species evolution. The principal hypothesis is that permissive or restrictive effects of the genome architecture on lineage selection during somatic cancer evolution exist and have a measurable impact. The systems approach postulates three classes of lineage selection effects of the genome architecture on somatic cancer evolution: i) effects mediated by changes of fitness of cells of cancer lineage, ii) effects mediated by changes of mutation probabilities and iii) effects mediated by changes of gene designation and physical and functional genome redundancy. Physical genome redundancy is the copy number of identical genetic sequences. Functional genome redundancy of a gene or a regulatory element is defined as the number of different genetic elements, regardless of copy number, coding for the same specific biological function within a cancer cell. Complex interactions of the genome architecture on lineage selection may be expected when modifications of the genome architecture have multiple and possibly opposed effects which manifest themselves at disparate times and progression stages. Dissection of putative mechanisms mediating constraints exerted by the genome architecture on somatic cancer evolution may provide an algorithm for understanding and predicting as well as modifying somatic cancer evolution in individual patients. PMID:23336076
Child Development and Structural Variation in the Human Genome

ERIC Educational Resources Information Center

Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

2013-01-01

Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

PubMed Central

Xu, Dong; Zhang, Yang

2013-01-01

Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418
Single-Cell Genomic Analysis in Plants

PubMed Central

Hu, Haifei; Scheben, Armin; Edwards, David

2018-01-01

Individual cells in an organism are variable, which strongly impacts cellular processes. Advances in sequencing technologies have enabled single-cell genomic analysis to become widespread, addressing shortcomings of analyses conducted on populations of bulk cells. While the field of single-cell plant genomics is in its infancy, there is great potential to gain insights into cell lineage and functional cell types to help understand complex cellular interactions in plants. In this review, we discuss current approaches for single-cell plant genomic analysis, with a focus on single-cell isolation, DNA amplification, next-generation sequencing, and bioinformatics analysis. We outline the technical challenges of analysing material from a single plant cell, and then examine applications of single-cell genomics and the integration of this approach with genome editing. Finally, we indicate future directions we expect in the rapidly developing field of plant single-cell genomic analysis. PMID:29361790
[Ethical considerations in genomic cohort study].

PubMed

Choi, Eun Kyung; Kim, Ock-Joo

2007-03-01

During the last decade, genomic cohort study has been developed in many countries by linking health data and genetic data in stored samples. Genomic cohort study is expected to find key genetic components that contribute to common diseases, thereby promising great advance in genome medicine. While many countries endeavor to build biobank systems, biobank-based genome research has raised important ethical concerns including genetic privacy, confidentiality, discrimination, and informed consent. Informed consent for biobank poses an important question: whether true informed consent is possible in population-based genomic cohort research where the nature of future studies is unforeseeable when consent is obtained. Due to the sensitive character of genetic information, protecting privacy and keeping confidentiality become important topics. To minimize ethical problems and achieve scientific goals to its maximum degree, each country strives to build population-based genomic cohort research project, by organizing public consultation, trying public and expert consensus in research, and providing safeguards to protect privacy and confidentiality.
The Divided Bacterial Genome: Structure, Function, and Evolution.

PubMed

diCenzo, George C; Finan, Turlough M

2017-09-01

Approximately 10% of bacterial genomes are split between two or more large DNA fragments, a genome architecture referred to as a multipartite genome. This multipartite organization is found in many important organisms, including plant symbionts, such as the nitrogen-fixing rhizobia, and plant, animal, and human pathogens, including the genera Brucella , Vibrio , and Burkholderia . The availability of many complete bacterial genome sequences means that we can now examine on a broad scale the characteristics of the different types of DNA molecules in a genome. Recent work has begun to shed light on the unique properties of each class of replicon, the unique functional role of chromosomal and nonchromosomal DNA molecules, and how the exploitation of novel niches may have driven the evolution of the multipartite genome. The aims of this review are to (i) outline the literature regarding bacterial genomes that are divided into multiple fragments, (ii) provide a meta-analysis of completed bacterial genomes from 1,708 species as a way of reviewing the abundant information present in these genome sequences, and (iii) provide an encompassing model to explain the evolution and function of the multipartite genome structure. This review covers, among other topics, salient genome terminology; mechanisms of multipartite genome formation; the phylogenetic distribution of multipartite genomes; how each part of a genome differs with respect to genomic signatures, genetic variability, and gene functional annotation; how each DNA molecule may interact; as well as the costs and benefits of this genome structure. Copyright © 2017 American Society for Microbiology.
Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tjong, Harianto; Li, Wenyuan; Kalhor, Reza

Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Here, our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm themore » presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.« less
Population-based 3D genome structure analysis reveals driving forces in spatial genome organization

DOE PAGES

Tjong, Harianto; Li, Wenyuan; Kalhor, Reza; ...

2016-03-07

Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Here, our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm themore » presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.« less
Identification of both copy number variation-type and constant-type core elements in a large segmental duplication region of the mouse genome

PubMed Central

2013-01-01

Background Copy number variation (CNV), an important source of diversity in genomic structure, is frequently found in clusters called CNV regions (CNVRs). CNVRs are strongly associated with segmental duplications (SDs), but the composition of these complex repetitive structures remains unclear. Results We conducted self-comparative-plot analysis of all mouse chromosomes using the high-speed and large-scale-homology search algorithm SHEAP. For eight chromosomes, we identified various types of large SD as tartan-checked patterns within the self-comparative plots. A complex arrangement of diagonal split lines in the self-comparative-plots indicated the presence of large homologous repetitive sequences. We focused on one SD on chromosome 13 (SD13M), and developed SHEPHERD, a stepwise ab initio method, to extract longer repetitive elements and to characterize repetitive structures in this region. Analysis using SHEPHERD showed the existence of 60 core elements, which were expected to be the basic units that form SDs within the repetitive structure of SD13M. The demonstration that sequences homologous to the core elements (>70% homology) covered approximately 90% of the SD13M region indicated that our method can characterize the repetitive structure of SD13M effectively. Core elements were composed largely of fragmented repeats of a previously identified type, such as long interspersed nuclear elements (LINEs), together with partial genic regions. Comparative genome hybridization array analysis showed that whereas 42 core elements were components of CNVR that varied among mouse strains, 8 did not vary among strains (constant type), and the status of the others could not be determined. The CNV-type core elements contained significantly larger proportions of long terminal repeat (LTR) types of retrotransposon than the constant-type core elements, which had no CNV. The higher divergence rates observed in the CNV-type core elements than in the constant type indicate that the CNV-type core elements have a longer evolutionary history than constant-type core elements in SD13M. Conclusions Our methodology for the identification of repetitive core sequences simplifies characterization of the structures of large SDs and detailed analysis of CNV. The results of detailed structural and quantitative analyses in this study might help to elucidate the biological role of one of the SDs on chromosome 13. PMID:23834397
Plant genome and transcriptome annotations: from misconceptions to simple solutions

PubMed Central

Bolger, Marie E; Arsova, Borjana; Usadel, Björn

2018-01-01

Abstract Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources. PMID:28062412
CRISPR-mediated Ophthalmic Genome Surgery.

PubMed

Cho, Galaxy Y; Abdulla, Yazeed; Sengillo, Jesse D; Justus, Sally; Schaefer, Kellie A; Bassuk, Alexander G; Tsang, Stephen H; Mahajan, Vinit B

2017-09-01

Clustered regularly interspaced short palindromic repeats (CRISPR) is a genome engineering system with great potential for clinical applications due to its versatility and programmability. This review highlights the development and use of CRISPR-mediated ophthalmic genome surgery in recent years. Diverse CRISPR techniques are in development to target a wide array of ophthalmic conditions, including inherited and acquired conditions. Preclinical disease modeling and recent successes in gene editing suggest potential efficacy of CRISPR as a therapeutic for inherited conditions. In particular, the treatment of Leber congenital amaurosis with CRISPR-mediated genome surgery is expected to reach clinical trials in the near future. Treatment options for inherited retinal dystrophies are currently limited. CRISPR-mediated genome surgery methods may be able to address this unmet need in the future.
The Fragmented Mitochondrial Ribosomal RNAs of Plasmodium falciparum

PubMed Central

Feagin, Jean E.; Harrell, Maria Isabel; Lee, Jung C.; Coe, Kevin J.; Sands, Bryan H.; Cannone, Jamie J.; Tami, Germaine; Schnare, Murray N.; Gutell, Robin R.

2012-01-01

Background The mitochondrial genome in the human malaria parasite Plasmodium falciparum is most unusual. Over half the genome is composed of the genes for three classic mitochondrial proteins: cytochrome oxidase subunits I and III and apocytochrome b. The remainder encodes numerous small RNAs, ranging in size from 23 to 190 nt. Previous analysis revealed that some of these transcripts have significant sequence identity with highly conserved regions of large and small subunit rRNAs, and can form the expected secondary structures. However, these rRNA fragments are not encoded in linear order; instead, they are intermixed with one another and the protein coding genes, and are coded on both strands of the genome. This unorthodox arrangement hindered the identification of transcripts corresponding to other regions of rRNA that are highly conserved and/or are known to participate directly in protein synthesis. Principal Findings The identification of 14 additional small mitochondrial transcripts from P. falcipaurm and the assignment of 27 small RNAs (12 SSU RNAs totaling 804 nt, 15 LSU RNAs totaling 1233 nt) to specific regions of rRNA are supported by multiple lines of evidence. The regions now represented are highly similar to those of the small but contiguous mitochondrial rRNAs of Caenorhabditis elegans. The P. falciparum rRNA fragments cluster on the interfaces of the two ribosomal subunits in the three-dimensional structure of the ribosome. Significance All of the rRNA fragments are now presumed to have been identified with experimental methods, and nearly all of these have been mapped onto the SSU and LSU rRNAs. Conversely, all regions of the rRNAs that are known to be directly associated with protein synthesis have been identified in the P. falciparum mitochondrial genome and RNA transcripts. The fragmentation of the rRNA in the P. falciparum mitochondrion is the most extreme example of any rRNA fragmentation discovered. PMID:22761677

Less Pollen-Mediated Gene Flow for More Signatures of Glacial Lineages: Congruent Evidence from Balsam Fir cpDNA and mtDNA for Multiple Refugia in Eastern and Central North America

PubMed Central

Cinget, Benjamin; Gérardi, Sébastien; Beaulieu, Jean; Bousquet, Jean

2015-01-01

The phylogeographic structure and postglacial history of balsam fir (Abies balsamea), a transcontinental North American boreal conifer, was inferred using mitochondrial DNA (mtDNA) and chloroplast DNA (cpDNA) markers. Genetic structure among 107 populations (mtDNA data) and 75 populations (cpDNA data) was analyzed using Bayesian and genetic distance approaches. Population differentiation was high for mtDNA (dispersed by seeds only), but also for cpDNA (dispersed by seeds and pollen), indicating that pollen gene flow is more restricted in balsam fir than in other boreal conifers. Low cpDNA gene flow in balsam fir may relate to low pollen production due to the inherent biology of the species and populations being decimated by recurrent spruce budworm epidemics, and/or to low dispersal of pollen grains due to their peculiar structural properties. Accordingly, a phylogeographic structure was detected using both mtDNA and cpDNA markers and population structure analyses supported the existence of at least five genetically distinct glacial lineages in central and eastern North America. Four of these would originate from glacial refugia located south of the Laurentide ice sheet, while the last one would have persisted in the northern Labrador region. As expected due to reduced pollen-mediated gene flow, congruence between the geographic distribution of mtDNA and cpDNA lineages was higher than in other North American conifers. However, concordance was not complete, reflecting that restricted but nonetheless detectable cpDNA gene flow among glacial lineages occurred during the Holocene. As a result, new cpDNA and mtDNA genome combinations indicative of cytoplasmic genome capture were observed. PMID:25849816
Functional assignment to JEV proteins using SVM.

PubMed

Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep

2008-01-01

Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP).
Functional assignment to JEV proteins using SVM

PubMed Central

Sahoo, Ganesh Chandra; Dikhit, Manas Ranjan; Das, Pradeep

2008-01-01

Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP). PMID:19052658
Contribution of transposable elements in the plant's genome.

PubMed

Sahebi, Mahbod; Hanafi, Mohamed M; van Wijnen, Andre J; Rice, David; Rafii, M Y; Azizi, Parisa; Osman, Mohamad; Taheri, Sima; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat; Noor, Yusuf Muhammad

2018-07-30

Plants maintain extensive growth flexibility under different environmental conditions, allowing them to continuously and rapidly adapt to alterations in their environment. A large portion of many plant genomes consists of transposable elements (TEs) that create new genetic variations within plant species. Different types of mutations may be created by TEs in plants. Many TEs can avoid the host's defense mechanisms and survive alterations in transposition activity, internal sequence and target site. Thus, plant genomes are expected to utilize a variety of mechanisms to tolerate TEs that are near or within genes. TEs affect the expression of not only nearby genes but also unlinked inserted genes. TEs can create new promoters, leading to novel expression patterns or alternative coding regions to generate alternate transcripts in plant species. TEs can also provide novel cis-acting regulatory elements that act as enhancers or inserts within original enhancers that are required for transcription. Thus, the regulation of plant gene expression is strongly managed by the insertion of TEs into nearby genes. TEs can also lead to chromatin modifications and thereby affect gene expression in plants. TEs are able to generate new genes and modify existing gene structures by duplicating, mobilizing and recombining gene fragments. They can also facilitate cellular functions by sharing their transposase-coding regions. Hence, TE insertions can not only act as simple mutagens but can also alter the elementary functions of the plant genome. Here, we review recent discoveries concerning the contribution of TEs to gene expression in plant genomes and discuss the different mechanisms by which TEs can affect plant gene expression and reduce host defense mechanisms. Copyright © 2018 Elsevier B.V. All rights reserved.
How to infer relative fitness from a sample of genomic sequences.

PubMed

Dayarian, Adel; Shraiman, Boris I

2014-07-01

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
Homoplastic microinversions and the avian tree of life

PubMed Central

2011-01-01

Background Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasy-free evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. Results We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially more microinversions than expected based upon prior information about vertebrate inversion rates, although this is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or united well-accepted groups. However, some homoplastic microinversions were evident among the informative characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor detectable sequence motifs were associated with microinversions in the hotspots. Conclusions Microinversions can provide valuable phylogenetic information, although power analysis indicates that large amounts of sequence data will be necessary to identify enough inversions (and similar RGCs) to resolve short branches in the tree of life. Moreover, microinversions are not perfect characters and should be interpreted with caution, just as with any other character type. Independent of their use for phylogenetic analyses, microinversions are important because they have the potential to complicate alignment of non-coding sequences. Despite their low rate of accumulation, they have clearly contributed to genome evolution, suggesting that active identification of microinversions will prove useful in future phylogenomic studies. PMID:21612607
Distribution and localization of microsatellites in the Perigord black truffle genome and identification of new molecular markers (2010) Fungal Genetics and Biology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murat, Claude; Riccioni, C; Belfiori, B

The level of genetic diversity and genetic structure in the Perigord black truffle (Tuber melanosporum Vittad.) has been debated for several years, mainly due to the lack of appropriate genetic markers. Microsatellites or simple sequence repeats (SSRs) are important for the genome organisation, phenotypic diversity and are one of the most popular molecular markers. In this study, we surveyed the T. melanosporum genome (1) to characterise its SSR pattern; (2) to compare it with SSR patterns found in 48 other fungal and three oomycetes genomes and (3) to identify new polymorphic SSR markers for population genetics. The T. melanosporum genomemore » is rich in SSRs with 22,425 SSRs with mono-nucleotides being the most frequent motifs. SSRs were found in all genomic regions although they are more frequent in non-coding regions (introns and intergenic regions). Sixty out of 135 PCR-amplified mono-, di-, tri-, tetra, penta, and hexanucleotides were polymorphic (44%) within black truffle populations and 27 were randomly selected and analysed on 139 T. melanosporum isolates from France, Italy and Spain. The number of alleles varied from 2 to 18 and the expected heterozygosity from 0.124 to 0.815. One hundred and thirty-two different multilocus genotypes out of the 139 T. melanosporum isolates were identified and the genotypic diversity was high (0.999). Polymorphic SSRs were found in UTR regulatory regions of fruiting bodies and ectomycorrhiza regulated genes, suggesting that they may play a role in phenotypic variation. In conclusion, SSRs developed in this study were highly polymorphic and our results showed that T. melanosporum is a species with an important genetic diversity, which is in agreement with its recently uncovered heterothallic mating system.« less
Statistical physics of nucleosome positioning and chromatin structure

NASA Astrophysics Data System (ADS)

Morozov, Alexandre

2012-02-01

Genomic DNA is packaged into chromatin in eukaryotic cells. The fundamental building block of chromatin is the nucleosome, a 147 bp-long DNA molecule wrapped around the surface of a histone octamer. Arrays of nucleosomes are positioned along DNA according to their sequence preferences and folded into higher-order chromatin fibers whose structure is poorly understood. We have developed a framework for predicting sequence-specific histone-DNA interactions and the effective two-body potential responsible for ordering nucleosomes into regular higher-order structures. Our approach is based on the analogy between nucleosomal arrays and a one-dimensional fluid of finite-size particles with nearest-neighbor interactions. We derive simple rules which allow us to predict nucleosome occupancy solely from the dinucleotide content of the underlying DNA sequences.Dinucleotide content determines the degree of stiffness of the DNA polymer and thus defines its ability to bend into the nucleosomal superhelix. As expected, the nucleosome positioning rules are universal for chromatin assembled in vitro on genomic DNA from baker's yeast and from the nematode worm C.elegans, where nucleosome placement follows intrinsic sequence preferences and steric exclusion. However, the positioning rules inferred from in vivo C.elegans chromatin are affected by global nucleosome depletion from chromosome arms relative to central domains, likely caused by the attachment of the chromosome arms to the nuclear membrane. Furthermore, intrinsic nucleosome positioning rules are overwritten in transcribed regions, indicating that chromatin organization is actively managed by the transcriptional and splicing machinery.
Microsatellite diversity and genetic structure among common bean (Phaseolus vulgaris L.) landraces in Brazil, a secondary center of diversity

PubMed Central

Burle, Marília Lobo; Fonseca, Jaime Roberto; Kami, James A.

2010-01-01

Brazil is the largest producer and consumer of common bean (Phaseolus vulgaris L.), which is the most important source of human dietary protein in that country. This study assessed the genetic diversity and the structure of a sample of 279 geo-referenced common bean landraces from Brazil, using molecular markers. Sixty-seven microsatellite markers spread over the 11 linkage groups of the common bean genome, as well as Phaseolin, PvTFL1y, APA and four SCAR markers were used. As expected, the sample showed lower genetic diversity compared to the diversity in the primary center of diversification. Andean and Mesoamerican gene pools were both present but the latter gene pool was four times more frequent than the former. The two gene pools could be clearly distinguished; limited admixture was observed between these groups. The Mesoamerican group consisted of two sub-populations, with a high level of admixture between them leading to a large proportion of stabilized hybrids not observed in the centers of domestication. Thus, Brazil can be considered a secondary center of diversification of common bean. A high degree of genome-wide multilocus associations even among unlinked loci was observed, confirming the high level of structure in the sample and suggesting that association mapping should be conducted in separate Andean and Mesoamerican Brazilian samples. Electronic supplementary material The online version of this article (doi:10.1007/s00122-010-1350-5) contains supplementary material, which is available to authorized users. PMID:20502861
The 8p23 inversion polymorphism determines local recombination heterogeneity across human populations.

PubMed

Alves, Joao M; Chikhi, Lounès; Amorim, António; Lopes, Alexandra M

2014-04-01

For decades, chromosomal inversions have been regarded as fascinating evolutionary elements as they are expected to suppress recombination between chromosomes with opposite orientations, leading to the accumulation of genetic differences between the two configurations over time. Here, making use of publicly available population genotype data for the largest polymorphic inversion in the human genome (8p23-inv), we assessed whether this inhibitory effect of inversion rearrangements led to significant differences in the recombination landscape of two homologous DNA segments, with opposite orientation. Our analysis revealed that the accumulation of genetic differentiation is positively correlated with the variation in recombination profiles. The observed recombination dissimilarity between inversion types is consistent across all populations analyzed and surpasses the effects of geographic structure, suggesting that both structures (orientations) have been evolving independently over an extended period of time, despite being subjected to the very same demographic history. Aside this mainly independent evolution, we also identified a short segment (350 kb, <10% of the whole inversion) in the central region of the inversion where the genetic divergence between the two structural haplotypes is diminished. Although it is difficult to demonstrate it, this could be due to gene flow (possibly via double-crossing over events), which is consistent with the higher recombination rates surrounding this segment. This study demonstrates for the first time that chromosomal inversions influence the recombination landscape at a fine-scale and highlights the role of these rearrangements as drivers of genome evolution.
Phenetic Comparison of Prokaryotic Genomes Using k-mers

PubMed Central

Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques

2017-01-01

Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508
I-motif DNA structures are formed in the nuclei of human cells

NASA Astrophysics Data System (ADS)

Zeraati, Mahdi; Langley, David B.; Schofield, Peter; Moye, Aaron L.; Rouet, Romain; Hughes, William E.; Bryan, Tracy M.; Dinger, Marcel E.; Christ, Daniel

2018-06-01

Human genome function is underpinned by the primary storage of genetic information in canonical B-form DNA, with a second layer of DNA structure providing regulatory control. I-motif structures are thought to form in cytosine-rich regions of the genome and to have regulatory functions; however, in vivo evidence for the existence of such structures has so far remained elusive. Here we report the generation and characterization of an antibody fragment (iMab) that recognizes i-motif structures with high selectivity and affinity, enabling the detection of i-motifs in the nuclei of human cells. We demonstrate that the in vivo formation of such structures is cell-cycle and pH dependent. Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions. Our results support the notion that i-motif structures provide key regulatory roles in the genome.
Molecular analysis of vector genome structures after liver transduction by conventional and self-complementary adeno-associated viral serotype vectors in murine and nonhuman primate models.

PubMed

Sun, Xun; Lu, You; Bish, Lawrence T; Calcedo, Roberto; Wilson, James M; Gao, Guangping

2010-06-01

Vectors based on several new adeno-associated viral (AAV) serotypes demonstrated strong hepatocyte tropism and transduction efficiency in both small- and large-animal models for liver-directed gene transfer. Efficiency of liver transduction by AAV vectors can be further improved in both murine and nonhuman primate (NHP) animals when the vector genomes are packaged in a self-complementary (sc) format. In an attempt to understand potential molecular mechanism(s) responsible for enhanced transduction efficiency of the sc vector in liver, we performed extensive molecular studies of genome structures of conventional single-stranded (ss) and sc AAV vectors from liver after AAV gene transfer in both mice and NHPs. These included treatment with exonucleases with specific substrate preferences, single-cutter restriction enzyme digestion and polarity-specific hybridization-based vector genome mapping, and bacteriophage phi29 DNA polymerase-mediated and double-stranded circular template-specific rescue of persisted circular genomes. In mouse liver, vector genomes of both genome formats seemed to persist primarily as episomal circular forms, but sc vectors converted into circular forms more rapidly and efficiently. However, the overall differences in vector genome abundance and structure in the liver between ss and sc vectors could not account for the remarkable differences in transduction. Molecular structures of persistent genomes of both ss and sc vectors were significantly more heterogeneous in macaque liver, with noticeable structural rearrangements that warrant further characterizations.
Molecular Analysis of Vector Genome Structures After Liver Transduction by Conventional and Self-Complementary Adeno-Associated Viral Serotype Vectors in Murine and Nonhuman Primate Models

PubMed Central

Sun, Xun; Lu, You; Bish, Lawrence T.; Calcedo, Roberto; Wilson, James M.

2010-01-01

Abstract Vectors based on several new adeno-associated viral (AAV) serotypes demonstrated strong hepatocyte tropism and transduction efficiency in both small- and large-animal models for liver-directed gene transfer. Efficiency of liver transduction by AAV vectors can be further improved in both murine and nonhuman primate (NHP) animals when the vector genomes are packaged in a self-complementary (sc) format. In an attempt to understand potential molecular mechanism(s) responsible for enhanced transduction efficiency of the sc vector in liver, we performed extensive molecular studies of genome structures of conventional single-stranded (ss) and sc AAV vectors from liver after AAV gene transfer in both mice and NHPs. These included treatment with exonucleases with specific substrate preferences, single-cutter restriction enzyme digestion and polarity-specific hybridization-based vector genome mapping, and bacteriophage ϕ29 DNA polymerase-mediated and double-stranded circular template-specific rescue of persisted circular genomes. In mouse liver, vector genomes of both genome formats seemed to persist primarily as episomal circular forms, but sc vectors converted into circular forms more rapidly and efficiently. However, the overall differences in vector genome abundance and structure in the liver between ss and sc vectors could not account for the remarkable differences in transduction. Molecular structures of persistent genomes of both ss and sc vectors were significantly more heterogeneous in macaque liver, with noticeable structural rearrangements that warrant further characterizations. PMID:20113166
Chromatin Insulators and Topological Domains: Adding New Dimensions to 3D Genome Architecture

PubMed Central

Matharu, Navneet K.; Ahanger, Sajad H.

2015-01-01

The spatial organization of metazoan genomes has a direct influence on fundamental nuclear processes that include transcription, replication, and DNA repair. It is imperative to understand the mechanisms that shape the 3D organization of the eukaryotic genomes. Chromatin insulators have emerged as one of the central components of the genome organization tool-kit across species. Recent advancements in chromatin conformation capture technologies have provided important insights into the architectural role of insulators in genomic structuring. Insulators are involved in 3D genome organization at multiple spatial scales and are important for dynamic reorganization of chromatin structure during reprogramming and differentiation. In this review, we will discuss the classical view and our renewed understanding of insulators as global genome organizers. We will also discuss the plasticity of chromatin structure and its re-organization during pluripotency and differentiation and in situations of cellular stress. PMID:26340639
Analysis of MHC class I genes across horse MHC haplotypes

PubMed Central

Tallmadge, Rebecca L.; Campbell, Julie A.; Miller, Donald C.; Antczak, Douglas F.

2010-01-01

The genomic sequences of 15 horse Major Histocompatibility Complex (MHC) class I genes and a collection of MHC class I homozygous horses of five different haplotypes were used to investigate the genomic structure and polymorphism of the equine MHC. A combination of conserved and locus-specific primers was used to amplify horse MHC class I genes with classical and non-classical characteristics. Multiple clones from each haplotype identified three to five classical sequences per homozygous animal, and two to three non-classical sequences. Phylogenetic analysis was applied to these sequences and groups were identified which appear to be allelic series, but some sequences were left ungrouped. Sequences determined from MHC class I heterozygous horses and previously described MHC class I sequences were then added, representing a total of ten horse MHC haplotypes. These results were consistent with those obtained from the MHC homozygous horses alone, and 30 classical sequences were assigned to four previously confirmed loci and three new provisional loci. The non-classical genes had few alleles and the classical genes had higher levels of allelic polymorphism. Alleles for two classical loci with the expected pattern of polymorphism were found in the majority of haplotypes tested, but alleles at two other commonly detected loci had more variation outside of the hypervariable region than within. Our data indicate that the equine Major Histocompatibility Complex is characterized by variation in the complement of class I genes expressed in different haplotypes in addition to the expected allelic polymorphism within loci. PMID:20099063
Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome

PubMed Central

Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

2016-01-01

The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853
The PiGeOn project: protocol for a longitudinal study examining psychosocial, behavioural and ethical issues and outcomes in cancer tumour genomic profiling.

PubMed

Best, Megan; Newson, Ainsley J; Meiser, Bettina; Juraskova, Ilona; Goldstein, David; Tucker, Kathy; Ballinger, Mandy L; Hess, Dominique; Schlub, Timothy E; Biesecker, Barbara; Vines, Richard; Vines, Kate; Thomas, David; Young, Mary-Anne; Savard, Jacqueline; Jacobs, Chris; Butow, Phyllis

2018-04-05

Genomic sequencing in cancer (both tumour and germline), and development of therapies targeted to tumour genetic status, hold great promise for improvement of patient outcomes. However, the imminent introduction of genomics into clinical practice calls for better understanding of how patients value, experience, and cope with this novel technology and its often complex results. Here we describe a protocol for a novel mixed-methods, prospective study (PiGeOn) that aims to examine patients' psychosocial, cognitive, affective and behavioural responses to tumour genomic profiling and to integrate a parallel critical ethical analysis of returning results. This is a cohort sub-study of a parent tumour genomic profiling programme enrolling patients with advanced cancer. One thousand patients will be recruited for the parent study in Sydney, Australia from 2016 to 2019. They will be asked to complete surveys at baseline, three, and five months. Primary outcomes are: knowledge, preferences, attitudes and values. A purposively sampled subset of patients will be asked to participate in three semi-structured interviews (at each time point) to provide deeper data interpretation. Relevant ethical themes will be critically analysed to iteratively develop or refine normative ethical concepts or frameworks currently used in the return of genetic information. This will be the first Australian study to collect longitudinal data on cancer patients' experience of tumour genomic profiling. Findings will be used to inform ongoing ethical debates on issues such as how to effectively obtain informed consent for genomic profiling return results, distinguish between research and clinical practice and manage patient expectations. The combination of quantitative and qualitative methods will provide comprehensive and critical data on how patients cope with 'actionable' and 'non-actionable' results. This information is needed to ensure that when tumour genomic profiling becomes part of routine clinical care, ethical considerations are embedded, and patients are adequately prepared and supported during and after receiving results. Not required for this sub-study, parent trial registration ACTRN12616000908437 .
Equilibrium properties of DNA and other semiflexible polymers confined in nanochannels

NASA Astrophysics Data System (ADS)

Muralidhar, Abhiram

Recent developments in next-generation sequencing (NGS) techniques have opened the door for low-cost, high-throughput sequencing of genomes. However, these developments have also exposed the inability of NGS to track large scale genomic information, which are extremely important to understand the relationship between genotype and phenotype. Genome mapping offers a reliable way to obtain information about large-scale structural variations in a given genome. A promising variant of genome mapping involves confining single DNA molecules in nanochannels whose cross-sectional dimensions are approximately 50 nm. Despite the development and commercialization of nanochannel-based genome mapping technology, the polymer physics of DNA in confinement is only beginning to be understood. Apart from its biological relevance, DNA is also used as a model polymer in experiments by polymer physicists. Indeed, the seminal experiments by Reisner et al. (2005) of DNA confined in nanochannels of different widths revealed discrepancies with the classical theories of Odijk and de Gennes for polymer confinement. Picking up from the conclusions of the dissertation of Tree (2014), this dissertation addresses a number of key outstanding problems in the area of nanoconfined DNA. Adopting a Monte Carlo chain growth technique known as the pruned-enriched Rosenbluth method, we examine the equilibrium and near-equilibrium properties of DNA and other semiflexible polymers in nanochannel confinement. We begin by analyzing the dependence of molecular weight on various thermodynamic properties of confined semiflexible polymers. This allows us to point out the finite size effects that can occur when using low molecular weight DNA in experiments. We then analyze the statistics of backfolding and hairpin formation in the context of existing theories and discuss how our results can be used to engineer better conditions for genome mapping. Finally, we elucidate the diffusion behavior of confined semiflexible polymers by comparing and contrasting our results for asymptotically long chains with other similar studies in the literature. We expect our findings to be not only beneficial to the design of better genome mapping devices, but also to the fundamental understanding of semiflexible polymers in confinement.
Exploring the role of genome and structural ions in preventing viral capsid collapse during dehydration

NASA Astrophysics Data System (ADS)

Martín-González, Natalia; Guérin Darvas, Sofía M.; Durana, Aritz; Marti, Gerardo A.; Guérin, Diego M. A.; de Pablo, Pedro J.

2018-03-01

Even though viruses evolve mainly in liquid milieu, their horizontal transmission routes often include episodes of dry environment. Along their life cycle, some insect viruses, such as viruses from the Dicistroviridae family, withstand dehydrated conditions with presently unknown consequences to their structural stability. Here, we use atomic force microscopy to monitor the structural changes of viral particles of Triatoma virus (TrV) after desiccation. Our results demonstrate that TrV capsids preserve their genome inside, conserving their height after exposure to dehydrating conditions, which is in stark contrast with other viruses that expel their genome when desiccated. Moreover, empty capsids (without genome) resulted in collapsed particles after desiccation. We also explored the role of structural ions in the dehydration process of the virions (capsid containing genome) by chelating the accessible cations from the external solvent milieu. We observed that ion suppression helps to keep the virus height upon desiccation. Our results show that under drying conditions, the genome of TrV prevents the capsid from collapsing during dehydration, while the structural ions are responsible for promoting solvent exchange through the virion wall.

Terminal structures of West Nile virus genomic RNA and their interactions with viral NS5 protein

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong Hongping; Zhang Bo; Shi Peiyong

2008-11-10

Genome cyclization is essential for flavivirus replication. We used RNases to probe the structures formed by the 5'-terminal 190 nucleotides and the 3'-terminal 111 nucleotides of the West Nile virus (WNV) genomic RNA. When analyzed individually, the two RNAs adopt stem-loop structures as predicted by the thermodynamic-folding program. However, when mixed together, the two RNAs form a duplex that is mediated through base-pairings of two sets of RNA elements (5'CS/3'CSI and 5'UAR/3'UAR). Formation of the RNA duplex facilitates a conformational change that leaves the 3'-terminal nucleotides of the genome (position - 8 to - 16) to be single-stranded. Viral NS5more » binds specifically to the 5'-terminal stem-loop (SL1) of the genomic RNA. The 5'SL1 RNA structure is essential for WNV replication. The study has provided further evidence to suggest that flavivirus genome cyclization and NS5/5'SL1 RNA interaction facilitate NS5 binding to the 3' end of the genome for the initiation of viral minus-strand RNA synthesis.« less
proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.

PubMed

Mende, Daniel R; Letunic, Ivica; Huerta-Cepas, Jaime; Li, Simone S; Forslund, Kristoffer; Sunagawa, Shinichi; Bork, Peer

2017-01-04

The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
APPLaUD: access for patients and participants to individual level uninterpreted genomic data.

PubMed

Thorogood, Adrian; Bobe, Jason; Prainsack, Barbara; Middleton, Anna; Scott, Erick; Nelson, Sarah; Corpas, Manuel; Bonhomme, Natasha; Rodriguez, Laura Lyman; Murtagh, Madeleine; Kleiderman, Erika

2018-02-17

There is a growing support for the stance that patients and research participants should have better and easier access to their raw (uninterpreted) genomic sequence data in both clinical and research contexts. We review legal frameworks and literature on the benefits, risks, and practical barriers of providing individuals access to their data. We also survey genomic sequencing initiatives that provide or plan to provide individual access. Many patients and research participants expect to be able to access their health and genomic data. Individuals have a legal right to access their genomic data in some countries and contexts. Moreover, increasing numbers of participatory research projects, direct-to-consumer genetic testing companies, and now major national sequencing initiatives grant individuals access to their genomic sequence data upon request. Drawing on current practice and regulatory analysis, we outline legal, ethical, and practical guidance for genomic sequencing initiatives seeking to offer interested patients and participants access to their raw genomic data.
Primers for polymerase chain reaction to detect genomic DNA of Toxocara canis and T. cati.

PubMed

Wu, Z; Nagano, I; Xu, D; Takahashi, Y

1997-03-01

Primers for polymerase chain reaction to amplify genomic DNA of both Toxocara canis and T. cati were constructed by adapting cloning and sequencing random amplified polymorphic DNA. The primers are expected to detect eggs and/or larvae of T. canis and T. cati, both of which are known to cause toxocariasis in humans.
Consumer Health Informatics Aspects of Direct-to-Consumer Personal Genomic Testing.

PubMed

Gray, Kathleen; Stephen, Remya; Terrill, Bronwyn; Wilson, Brenda; Middleton, Anna; Tytherleigh, Rigan; Turbitt, Erin; Gaff, Clara; Savard, Jacqueline; Hickerton, Chriselle; Newson, Ainsley; Metcalfe, Sylvia

2017-01-01

This paper uses consumer health informatics as a framework to explore whether and how direct-to-consumer personal genomic testing can be regarded as a form of information which assists consumers to manage their health. It presents findings from qualitative content analysis of web sites that offer testing services, and of transcripts from focus groups conducted as part a study of the Australian public's expectations of personal genomics. Content analysis showed that service offerings have some features of consumer health information but lack consistency. Focus group participants were mostly unfamiliar with the specifics of test reports and related information services. Some of their ideas about aids to knowledge were in line with the benefits described on provider web sites, but some expectations were inflated. People were ambivalent about whether these services would address consumers' health needs, interests and contexts and whether they would support consumers' health self-management decisions and outcomes. There is scope for consumer health informatics approaches to refine the usage and the utility of direct-to-consumer personal genomic testing. Further research may focus on how uptake is affected by consumers' health literacy or by services' engagement with consumers about what they really want.
A Genomic Selection Index Applied to Simulated and Real Data

PubMed Central

Ceron-Rojas, J. Jesus; Crossa, José; Arief, Vivi N.; Basford, Kaye; Rutkoski, Jessica; Jarquín, Diego; Alvarado, Gregorio; Beyene, Yoseph; Semagn, Kassa; DeLacy, Ian

2015-01-01

A genomic selection index (GSI) is a linear combination of genomic estimated breeding values that uses genomic markers to predict the net genetic merit and select parents from a nonphenotyped testing population. Some authors have proposed a GSI; however, they have not used simulated or real data to validate the GSI theory and have not explained how to estimate the GSI selection response and the GSI expected genetic gain per selection cycle for the unobserved traits after the first selection cycle to obtain information about the genetic gains in each subsequent selection cycle. In this paper, we develop the theory of a GSI and apply it to two simulated and four real data sets with four traits. Also, we numerically compare its efficiency with that of the phenotypic selection index (PSI) by using the ratio of the GSI response over the PSI response, and the PSI and GSI expected genetic gain per selection cycle for observed and unobserved traits, respectively. In addition, we used the Technow inequality to compare GSI vs. PSI efficiency. Results from the simulated data were confirmed by the real data, indicating that GSI was more efficient than PSI per unit of time. PMID:26290571
Self-similarity analysis of eubacteria genome based on weighted graph.

PubMed

Qi, Zhao-Hui; Li, Ling; Zhang, Zhi-Meng; Qi, Xiao-Qin

2011-07-07

We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes. Copyright © 2011 Elsevier Ltd. All rights reserved.
Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

PubMed

Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W

2016-06-13

Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.
The complete mitochondrial genome of the sandbar shark Carcharhinus plumbeus.

PubMed

Blower, Dean C; Ovenden, Jennifer R

2016-01-01

The sandbar shark, Carcharhinus plumbeus, a major representative species in shark fisheries worldwide is now considered vulnerable to overfishing. A pool of 774,234 Roche 454 shotgun sequences from one individual were assembled into a 16,706 bp mitogenome with 33× average coverage depth. It comprised 13 protein coding genes, 22 transfer RNA's, 2 ribosomal genes and 2 non-coding regions, typical of a vertebrate mitogenome. As expected for sharks, an A-T nucleotide bias was evident. This adds to rapidly growing number of mitogenome assemblies for the economically important Carcharhinidae family. The C. plumbeus mitogenome will assist researchers, fisheries and conservation managers interested in shark molecular systematics, phylogeography, conservation genetics, population and stock structure.
An Integrative Breakage Model of genome architecture, reshuffling and evolution: The Integrative Breakage Model of genome evolution, a novel multidisciplinary hypothesis for the study of genome plasticity.

PubMed

Farré, Marta; Robinson, Terence J; Ruiz-Herrera, Aurora

2015-05-01

Our understanding of genomic reorganization, the mechanics of genomic transmission to offspring during germ line formation, and how these structural changes contribute to the speciation process, and genetic disease is far from complete. Earlier attempts to understand the mechanism(s) and constraints that govern genome remodeling suffered from being too narrowly focused, and failed to provide a unified and encompassing view of how genomes are organized and regulated inside cells. Here, we propose a new multidisciplinary Integrative Breakage Model for the study of genome evolution. The analysis of the high-level structural organization of genomes (nucleome), together with the functional constrains that accompany genome reshuffling, provide insights into the origin and plasticity of genome organization that may assist with the detection and isolation of therapeutic targets for the treatment of complex human disorders. © 2015 WILEY Periodicals, Inc.
PIK3CA mutant tumors depend on oxoglutarate dehydrogenase | Office of Cancer Genomics

Cancer.gov

Oncogenic PIK3CA mutations are found in a significant fraction of human cancers, but therapeutic inhibition of PI3K has only shown limited success in clinical trials. To understand how mutant PIK3CA contributes to cancer cell proliferation, we used genome scale loss-of-function screening in a large number of genomically annotated cancer cell lines. As expected, we found that PIK3CA mutant cancer cells require PIK3CA but also require the expression of the TCA cycle enzyme 2-oxoglutarate dehydrogenase (OGDH).
The beliefs, motivations, and expectations of parents who have enrolled their children in a genetic biorepository.

PubMed

Harris, Erin D; Ziniel, Sonja I; Amatruda, Jonathan G; Clinton, Catherine M; Savage, Sarah K; Taylor, Patrick L; Huntington, Noelle L; Green, Robert C; Holm, Ingrid A

2012-03-01

Little is known about parental attitudes toward return of individual research results (IRRs) in pediatric genomic research. The aim of this study was to understand the views of the parents who enrolled their children in a genomic repository in which IRRs will be returned. We conducted focus groups with parents of children with developmental disorders enrolled in the Gene Partnership (GP), a genomic research repository that offers to return IRRs, to learn about their understanding of the GP, motivations for enrolling their children, and expectations regarding the return of IRRs. Parents hoped to receive IRRs that would help them better understand their children's condition(s). They understood that this outcome was unlikely, but hoped that their children's participation in the GP would contribute to scientific knowledge. Most parents wanted to receive all IRRs about their child, even for diseases that were severe and untreatable, citing reasons of personal utility. Parents preferred electronic delivery of the results and wanted to designate their preferences regarding what information they would receive. It is important for researchers to understand participant expectations in enrolling in a research repository that offers to disclose children's IRRs in order to effectively communicate the implications to parents during the consenting process.
Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies

PubMed Central

Pe’er, Itsik

2017-01-01

Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium. PMID:28715421
Coordinates and intervals in graph-based reference genomes.

PubMed

Rand, Knut D; Grytten, Ivar; Nederbragt, Alexander J; Storvik, Geir O; Glad, Ingrid K; Sandve, Geir K

2017-05-18

It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .
Genome Pool Strategy for Structural Coverage of Protein Families

PubMed Central

Jaroszewski, Lukasz; Slabinski, Lukasz; Wooley, John; Deacon, Ashley M.; Lesley, Scott A.; Wilson, Ian. A.; Godzik, Adam

2010-01-01

As noticed by generations of structural biologists, closely homologous proteins may have substantially different crystallization properties and propensities. These observations can be used to systematically introduce additional dimensionality into crystallization trials by targeting homologous proteins from multiple genomes in a “genome pool” strategy. Through extensive use of our recently introduced “crystallization feasibility score” (Slabinski et al., 2007a), we can explain that the genome pool strategy works well because the crystallization feasibility scores are surprisingly broad within families of homologous proteins, with most families containing a range of optimal to very difficult targets. We also show that some families can be regarded as relatively “easy”, where a significant number of proteins are predicted to have optimal crystallization features, and others are “very difficult”, where almost none are predicted to result in a crystal structure. Thus, the outcome of such variable distributions of such crystallizability' preferences leads to uneven structural coverage of known families, with “easier” or “optimal” families having several times more solved structures than “very difficult” ones. Nevertheless, this latter category can be successfully targeted by increasing the number of genomes that are used to select targets from a given family. On average, adding 10 new genomes to the “genome pool” provides more promising targets for 7 “very difficult” families. In contrast, our crystallization feasibility score does not indicate that any specific microbial genomes can be readily classified as “easier” or “very difficult” with respect to providing suitable candidates for crystallization and structure determination. Finally, our analyses show that specific physicochemical properties of the protein sequence favor successful outcomes for structure determination and, hence, the group of proteins with known 3D structures is systematically different from the general pool of known proteins. We, therefore, assess the structural consequences of these differences in protein sequence and protein biophysical properties. PMID:19000818
Honey Bee Deformed Wing Virus Structures Reveal that Conformational Changes Accompany Genome Release.

PubMed

Organtini, Lindsey J; Shingler, Kristin L; Ashley, Robert E; Capaldi, Elizabeth A; Durrani, Kulsoom; Dryden, Kelly A; Makhov, Alexander M; Conway, James F; Pizzorno, Marie C; Hafenstein, Susan

2017-01-15

The picornavirus-like deformed wing virus (DWV) has been directly linked to colony collapse; however, little is known about the mechanisms of host attachment or entry for DWV or its molecular and structural details. Here we report the three-dimensional (3-D) structures of DWV capsids isolated from infected honey bees, including the immature procapsid, the genome-filled virion, the putative entry intermediate (A-particle), and the empty capsid that remains after genome release. The capsids are decorated by large spikes around the 5-fold vertices. The 5-fold spikes had an open flower-like conformation for the procapsid and genome-filled capsids, whereas the putative A-particle and empty capsids that had released the genome had a closed tube-like spike conformation. Between the two conformations, the spikes undergo a significant hinge-like movement that we predicted using a Robetta model of the structure comprising the spike. We conclude that the spike structures likely serve a function during host entry, changing conformation to release the genome, and that the genome may escape from a 5-fold vertex to initiate infection. Finally, the structures illustrate that, similarly to picornaviruses, DWV forms alternate particle conformations implicated in assembly, host attachment, and RNA release. Honey bees are critical for global agriculture, but dramatic losses of entire hives have been reported in numerous countries since 2006. Deformed wing virus (DWV) and infestation with the ectoparasitic mite Varroa destructor have been linked to colony collapse disorder. DWV was purified from infected adult worker bees to pursue biochemical and structural studies that allowed the first glimpse into the conformational changes that may be required during transmission and genome release for DWV. Copyright © 2017 American Society for Microbiology.
Divergence of Mammalian Higher Order Chromatin Structure Is Associated with Developmental Loci

PubMed Central

Chambers, Emily V.; Bickmore, Wendy A.; Semple, Colin A.

2013-01-01

Several recent studies have examined different aspects of mammalian higher order chromatin structure – replication timing, lamina association and Hi-C inter-locus interactions — and have suggested that most of these features of genome organisation are conserved over evolution. However, the extent of evolutionary divergence in higher order structure has not been rigorously measured across the mammalian genome, and until now little has been known about the characteristics of any divergent loci present. Here, we generate a dataset combining multiple measurements of chromatin structure and organisation over many embryonic cell types for both human and mouse that, for the first time, allows a comprehensive assessment of the extent of structural divergence between mammalian genomes. Comparison of orthologous regions confirms that all measurable facets of higher order structure are conserved between human and mouse, across the vast majority of the detectably orthologous genome. This broad similarity is observed in spite of many loci possessing cell type specific structures. However, we also identify hundreds of regions (from 100 Kb to 2.7 Mb in size) showing consistent evidence of divergence between these species, constituting at least 10% of the orthologous mammalian genome and encompassing many hundreds of human and mouse genes. These regions show unusual shifts in human GC content, are unevenly distributed across both genomes, and are enriched in human subtelomeric regions. Divergent regions are also relatively enriched for genes showing divergent expression patterns between human and mouse ES cells, implying these regions cause divergent regulation. Particular divergent loci are strikingly enriched in genes implicated in vertebrate development, suggesting important roles for structural divergence in the evolution of mammalian developmental programmes. These data suggest that, though relatively rare in the mammalian genome, divergence in higher order chromatin structure has played important roles during evolution. PMID:23592965
Reduced Mutation Rate and Increased Transformability of Transposon-Free Acinetobacter baylyi ADP1-ISx.

PubMed

Suárez, Gabriel A; Renda, Brian A; Dasgupta, Aurko; Barrick, Jeffrey E

2017-09-01

The genomes of most bacteria contain mobile DNA elements that can contribute to undesirable genetic instability in engineered cells. In particular, transposable insertion sequence (IS) elements can rapidly inactivate genes that are important for a designed function. We deleted all six copies of IS 1236 from the genome of the naturally transformable bacterium Acinetobacter baylyi ADP1. The natural competence of ADP1 made it possible to rapidly repair deleterious point mutations that arose during strain construction. In the resulting ADP1-ISx strain, the rates of mutations inactivating a reporter gene were reduced by 7- to 21-fold. This reduction was higher than expected from the incidence of new IS 1236 insertions found during a 300-day mutation accumulation experiment with wild-type ADP1 that was used to estimate spontaneous mutation rates in the strain. The extra improvement appears to be due in part to eliminating large deletions caused by IS 1236 activity, as the point mutation rate was unchanged in ADP1-ISx. Deletion of an error-prone polymerase ( dinP ) and a DNA damage response regulator ( umuD Ab [the umuD gene of A. baylyi ]) from the ADP1-ISx genome did not further reduce mutation rates. Surprisingly, ADP1-ISx exhibited increased transformability. This improvement may be due to less autolysis and aggregation of the engineered cells than of the wild type. Thus, deleting IS elements from the ADP1 genome led to a greater than expected increase in evolutionary reliability and unexpectedly enhanced other key strain properties, as has been observed for other clean-genome bacterial strains. ADP1-ISx is an improved chassis for metabolic engineering and other applications. IMPORTANCE Acinetobacter baylyi ADP1 has been proposed as a next-generation bacterial host for synthetic biology and genome engineering due to its ability to efficiently take up DNA from its environment during normal growth. We deleted transposable elements that are capable of copying themselves, inserting into other genes, and thereby inactivating them from the ADP1 genome. The resulting "clean-genome" ADP1-ISx strain exhibited larger reductions in the rates of inactivating mutations than expected from spontaneous mutation rates measured via whole-genome sequencing of lineages evolved under relaxed selection. Surprisingly, we also found that IS element activity reduces transformability and is a major cause of cell aggregation and death in wild-type ADP1 grown under normal laboratory conditions. More generally, our results demonstrate that domesticating a bacterial genome by removing mobile DNA elements that have accumulated during evolution in the wild can have unanticipated benefits. Copyright © 2017 American Society for Microbiology.
Efficient isolation method for high-quality genomic DNA from cicada exuviae.

PubMed

Nguyen, Hoa Quynh; Kim, Ye Inn; Borzée, Amaël; Jang, Yikweon

2017-10-01

In recent years, animal ethics issues have led researchers to explore nondestructive methods to access materials for genetic studies. Cicada exuviae are among those materials because they are cast skins that individuals left after molt and are easily collected. In this study, we aim to identify the most efficient extraction method to obtain high quantity and quality of DNA from cicada exuviae. We compared relative DNA yield and purity of six extraction protocols, including both manual protocols and available commercial kits, extracting from four different exoskeleton parts. Furthermore, amplification and sequencing of genomic DNA were evaluated in terms of availability of sequencing sequence at the expected genomic size. Both the choice of protocol and exuvia part significantly affected DNA yield and purity. Only samples that were extracted using the PowerSoil DNA Isolation kit generated gel bands of expected size as well as successful sequencing results. The failed attempts to extract DNA using other protocols could be partially explained by a low DNA yield from cicada exuviae and partly by contamination with humic acids that exist in the soil where cicada nymphs reside before emergence, as shown by spectroscopic measurements. Genomic DNA extracted from cicada exuviae could provide valuable information for species identification, allowing the investigation of genetic diversity across consecutive broods, or spatiotemporal variation among various populations. Consequently, we hope to provide a simple method to acquire pure genomic DNA applicable for multiple research purposes.
Single cell Hi-C reveals cell-to-cell variability in chromosome structure

PubMed Central

Schoenfelder, Stefan; Yaffe, Eitan; Dean, Wendy; Laue, Ernest D.; Tanay, Amos; Fraser, Peter

2013-01-01

Large-scale chromosome structure and spatial nuclear arrangement have been linked to control of gene expression and DNA replication and repair. Genomic techniques based on chromosome conformation capture assess contacts for millions of loci simultaneously, but do so by averaging chromosome conformations from millions of nuclei. Here we introduce single cell Hi-C, combined with genome-wide statistical analysis and structural modeling of single copy X chromosomes, to show that individual chromosomes maintain domain organisation at the megabase scale, but show variable cell-to-cell chromosome territory structures at larger scales. Despite this structural stochasticity, localisation of active gene domains to boundaries of territories is a hallmark of chromosomal conformation. Single cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organisation underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns. PMID:24067610

A HapMap harvest of insights into the genetics of common disease

PubMed Central

Manolio, Teri A.; Brooks, Lisa D.; Collins, Francis S.

2008-01-01

The International HapMap Project was designed to create a genome-wide database of patterns of human genetic variation, with the expectation that these patterns would be useful for genetic association studies of common diseases. This expectation has been amply fulfilled with just the initial output of genome-wide association studies, identifying nearly 100 loci for nearly 40 common diseases and traits. These associations provided new insights into pathophysiology, suggesting previously unsuspected etiologic pathways for common diseases that will be of use in identifying new therapeutic targets and developing targeted interventions based on genetically defined risk. In addition, HapMap-based discoveries have shed new light on the impact of evolutionary pressures on the human genome, suggesting multiple loci important for adapting to disease-causing pathogens and new environments. In this review we examine the origin, development, and current status of the HapMap; its prospects for continued evolution; and its current and potential future impact on biomedical science. PMID:18451988
Identification of novel RNA secondary structures within the hepatitis C virus genome reveals a cooperative involvement in genome packaging

PubMed Central

Stewart, H.; Bingham, R.J.; White, S. J.; Dykeman, E. C.; Zothner, C.; Tuplin, A. K.; Stockley, P. G.; Twarock, R.; Harris, M.

2016-01-01

The specific packaging of the hepatitis C virus (HCV) genome is hypothesised to be driven by Core-RNA interactions. To identify the regions of the viral genome involved in this process, we used SELEX (systematic evolution of ligands by exponential enrichment) to identify RNA aptamers which bind specifically to Core in vitro. Comparison of these aptamers to multiple HCV genomes revealed the presence of a conserved terminal loop motif within short RNA stem-loop structures. We postulated that interactions of these motifs, as well as sub-motifs which were present in HCV genomes at statistically significant levels, with the Core protein may drive virion assembly. We mutated 8 of these predicted motifs within the HCV infectious molecular clone JFH-1, thereby producing a range of mutant viruses predicted to possess altered RNA secondary structures. RNA replication and viral titre were unaltered in viruses possessing only one mutated structure. However, infectivity titres were decreased in viruses possessing a higher number of mutated regions. This work thus identified multiple novel RNA motifs which appear to contribute to genome packaging. We suggest that these structures act as cooperative packaging signals to drive specific RNA encapsidation during HCV assembly. PMID:26972799
Molecular models of NS3 protease variants of the Hepatitis C virus.

PubMed

da Silveira, Nelson J F; Arcuri, Helen A; Bonalumi, Carlos E; de Souza, Fátima P; Mello, Isabel M V G C; Rahal, Paula; Pinho, João R R; de Azevedo, Walter F

2005-01-21

Hepatitis C virus (HCV) currently infects approximately three percent of the world population. In view of the lack of vaccines against HCV, there is an urgent need for an efficient treatment of the disease by an effective antiviral drug. Rational drug design has not been the primary way for discovering major therapeutics. Nevertheless, there are reports of success in the development of inhibitor using a structure-based approach. One of the possible targets for drug development against HCV is the NS3 protease variants. Based on the three-dimensional structure of these variants we expect to identify new NS3 protease inhibitors. In order to speed up the modeling process all NS3 protease variant models were generated in a Beowulf cluster. The potential of the structural bioinformatics for development of new antiviral drugs is discussed. The atomic coordinates of crystallographic structure 1CU1 and 1DY9 were used as starting model for modeling of the NS3 protease variant structures. The NS3 protease variant structures are composed of six subdomains, which occur in sequence along the polypeptide chain. The protease domain exhibits the dual beta-barrel fold that is common among members of the chymotrypsin serine protease family. The helicase domain contains two structurally related beta-alpha-beta subdomains and a third subdomain of seven helices and three short beta strands. The latter domain is usually referred to as the helicase alpha-helical subdomain. The rmsd value of bond lengths and bond angles, the average G-factor and Verify 3D values are presented for NS3 protease variant structures. This project increases the certainty that homology modeling is an useful tool in structural biology and that it can be very valuable in annotating genome sequence information and contributing to structural and functional genomics from virus. The structural models will be used to guide future efforts in the structure-based drug design of a new generation of NS3 protease variants inhibitors. All models in the database are publicly accessible via our interactive website, providing us with large amount of structural models for use in protein-ligand docking analysis.
Complete Genomic Sequence and Comparative Analysis of the Genome Segments of Sweet Potato Chlorotic Stunt Virus in China

PubMed Central

Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling

2014-01-01

Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926
The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes.

PubMed

Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars

2017-02-10

The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
The Qatar genome project: translation of whole-genome sequencing into clinical practice.

PubMed

Zayed, Hatem

2016-10-01

Qatar Genome Project was launched in 2013 with the intent to sequence the genome of each Qatari citizen in an effort to protect Qataris from the high rate of indigenous genetic diseases by allowing the mapping of disease-causing variants/rare variants and establishing a Qatari reference genome. Indeed, this project is expected to have numerous global benefits because the elevated homogeneity of the Qatari population, that will make Qatar an excellent genetic laboratory that will generate a wealth of data that will allow us to make sense of the genotype-phenotype correlations of many diseases, especially the complex multifactorial diseases, and will pave the way for changing the traditional medical practice of looking first at the phenotype rather than the genotype. © 2016 John Wiley & Sons Ltd.
Whole Genome Amplification and Reduced-Representation Genome Sequencing of Schistosoma japonicum Miracidia

PubMed Central

Shortt, Jonathan A.; Card, Daren C.; Schield, Drew R.; Liu, Yang; Zhong, Bo; Castoe, Todd A.

2017-01-01

Background In areas where schistosomiasis control programs have been implemented, morbidity and prevalence have been greatly reduced. However, to sustain these reductions and move towards interruption of transmission, new tools for disease surveillance are needed. Genomic methods have the potential to help trace the sources of new infections, and allow us to monitor drug resistance. Large-scale genotyping efforts for schistosome species have been hindered by cost, limited numbers of established target loci, and the small amount of DNA obtained from miracidia, the life stage most readily acquired from humans. Here, we present a method using next generation sequencing to provide high-resolution genomic data from S. japonicum for population-based studies. Methodology/Principal Findings We applied whole genome amplification followed by double digest restriction site associated DNA sequencing (ddRADseq) to individual S. japonicum miracidia preserved on Whatman FTA cards. We found that we could effectively and consistently survey hundreds of thousands of variants from 10,000 to 30,000 loci from archived miracidia as old as six years. An analysis of variation from eight miracidia obtained from three hosts in two villages in Sichuan showed clear population structuring by village and host even within this limited sample. Conclusions/Significance This high-resolution sequencing approach yields three orders of magnitude more information than microsatellite genotyping methods that have been employed over the last decade, creating the potential to answer detailed questions about the sources of human infections and to monitor drug resistance. Costs per sample range from $50-$200, depending on the amount of sequence information desired, and we expect these costs can be reduced further given continued reductions in sequencing costs, improvement of protocols, and parallelization. This approach provides new promise for using modern genome-scale sampling to S. japonicum surveillance, and could be applied to other schistosome species and other parasitic helminthes. PMID:28107347
Molecular structure and chromosome distribution of three repetitive DNA families in Anemone hortensis L. (Ranunculaceae).

PubMed

Mlinarec, Jelena; Chester, Mike; Siljak-Yakovlev, Sonja; Papes, Drazena; Leitch, Andrew R; Besendorfer, Visnja

2009-01-01

The structure, abundance and location of repetitive DNA sequences on chromosomes can characterize the nature of higher plant genomes. Here we report on three new repeat DNA families isolated from Anemone hortensis L.; (i) AhTR1, a family of satellite DNA (stDNA) composed of a 554-561 bp long EcoRV monomer; (ii) AhTR2, a stDNA family composed of a 743 bp long HindIII monomer and; (iii) AhDR, a repeat family composed of a 945 bp long HindIII fragment that exhibits some sequence similarity to Ty3/gypsy-like retroelements. Fluorescence in-situ hybridization (FISH) to metaphase chromosomes of A. hortensis (2n = 16) revealed that both AhTR1 and AhTR2 sequences co-localized with DAPI-positive AT-rich heterochromatic regions. AhTR1 sequences occur at intercalary DAPI bands while AhTR2 sequences occur at 8-10 terminally located heterochromatic blocks. In contrast AhDR sequences are dispersed over all chromosomes as expected of a Ty3/gypsy-like element. AhTR2 and AhTR1 repeat families include polyA- and polyT-tracks, AT/TA-motifs and a pentanucleotide sequence (CAAAA) that may have consequences for chromatin packing and sequence homogeneity. AhTR2 repeats also contain TTTAGGG motifs and degenerate variants. We suggest that they arose by interspersion of telomeric repeats with subtelomeric repeats, before hybrid unit(s) amplified through the heterochromatic domain. The three repetitive DNA families together occupy approximately 10% of the A. hortensis genome. Comparative analyses of eight Anemone species revealed that the divergence of the A. hortensis genome was accompanied by considerable modification and/or amplification of repeats.
A 14-3-3 Family Protein from Wild Soybean (Glycine Soja) Regulates ABA Sensitivity in Arabidopsis

PubMed Central

Sun, Xiaoli; Sun, Mingzhe; Jia, Bowei; Chen, Chao; Qin, Zhiwei; Yang, Kejun; Shen, Yang; Meiping, Zhang; Mingyang, Cong; Zhu, Yanming

2015-01-01

It is widely accepted that the 14-3-3 family proteins are key regulators of multiple stress signal transduction cascades. By conducting genome-wide analysis, researchers have identified the soybean 14-3-3 family proteins; however, until now, there is still no direct genetic evidence showing the involvement of soybean 14-3-3s in ABA responses. Hence, in this study, based on the latest Glycine max genome on Phytozome v10.3, we initially analyzed the evolutionary relationship, genome organization, gene structure and duplication, and three-dimensional structure of soybean 14-3-3 family proteins systematically. Our results suggested that soybean 14-3-3 family was highly evolutionary conserved and possessed segmental duplication in evolution. Then, based on our previous functional characterization of a Glycine soja 14-3-3 protein GsGF14o in drought stress responses, we further investigated the expression characteristics of GsGF14o in detail, and demonstrated its positive roles in ABA sensitivity. Quantitative real-time PCR analyses in Glycine soja seedlings and GUS activity assays in PGsGF14O:GUS transgenic Arabidopsis showed that GsGF14o expression was moderately and rapidly induced by ABA treatment. As expected, GsGF14o overexpression in Arabidopsis augmented the ABA inhibition of seed germination and seedling growth, promoted the ABA induced stomata closure, and up-regulated the expression levels of ABA induced genes. Moreover, through yeast two hybrid analyses, we further demonstrated that GsGF14o physically interacted with the AREB/ABF transcription factors in yeast cells. Taken together, results presented in this study strongly suggested that GsGF14o played an important role in regulation of ABA sensitivity in Arabidopsis. PMID:26717241
A saturated SSR/DArT linkage map of Musa acuminata addressing genome rearrangements among bananas.

PubMed

Hippolyte, Isabelle; Bakry, Frederic; Seguin, Marc; Gardes, Laetitia; Rivallan, Ronan; Risterucci, Ange-Marie; Jenny, Christophe; Perrier, Xavier; Carreel, Françoise; Argout, Xavier; Piffanelli, Pietro; Khan, Imtiaz A; Miller, Robert N G; Pappas, Georgios J; Mbéguié-A-Mbéguié, Didier; Matsumoto, Takashi; De Bernardinis, Veronique; Huttner, Eric; Kilian, Andrzej; Baurens, Franc-Christophe; D'Hont, Angélique; Cote, François; Courtois, Brigitte; Glaszmann, Jean-Christophe

2010-04-13

The genus Musa is a large species complex which includes cultivars at diploid and triploid levels. These sterile and vegetatively propagated cultivars are based on the A genome from Musa acuminata, exclusively for sweet bananas such as Cavendish, or associated with the B genome (Musa balbisiana) in cooking bananas such as Plantain varieties. In M. acuminata cultivars, structural heterozygosity is thought to be one of the main causes of sterility, which is essential for obtaining seedless fruits but hampers breeding. Only partial genetic maps are presently available due to chromosomal rearrangements within the parents of the mapping populations. This causes large segregation distortions inducing pseudo-linkages and difficulties in ordering markers in the linkage groups. The present study aims at producing a saturated linkage map of M. acuminata, taking into account hypotheses on the structural heterozygosity of the parents. An F1 progeny of 180 individuals was obtained from a cross between two genetically distant accessions of M. acuminata, 'Borneo' and 'Pisang Lilin' (P. Lilin). Based on the gametic recombination of each parent, two parental maps composed of SSR and DArT markers were established. A significant proportion of the markers (21.7%) deviated (p < 0.05) from the expected Mendelian ratios. These skewed markers were distributed in different linkage groups for each parent. To solve some complex ordering of the markers on linkage groups, we associated tools such as tree-like graphic representations, recombination frequency statistics and cytogenetical studies to identify structural rearrangements and build parsimonious linkage group order. An illustration of such an approach is given for the P. Lilin parent. We propose a synthetic map with 11 linkage groups containing 489 markers (167 SSRs and 322 DArTs) covering 1197 cM. This first saturated map is proposed as a "reference Musa map" for further analyses. We also propose two complete parental maps with interpretations of structural rearrangements localized on the linkage groups. The structural heterozygosity in P. Lilin is hypothesized to result from a duplication likely accompanied by an inversion on another chromosome. This paper also illustrates a methodological approach, transferable to other species, to investigate the mapping of structural rearrangements and determine their consequences on marker segregation.
Minimal Absent Words in Four Human Genome Assemblies

PubMed Central

Garcia, Sara P.; Pinho, Armando J.

2011-01-01

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. PMID:22220210
An update on the genetic architecture of hyperuricemia and gout.

PubMed

Merriman, Tony R

2015-04-10

Genome-wide association studies that scan the genome for common genetic variants associated with phenotype have greatly advanced medical knowledge. Hyperuricemia is no exception, with 28 loci identified. However, genetic control of pathways determining gout in the presence of hyperuricemia is still poorly understood. Two important pathways determining hyperuricemia have been confirmed (renal and gut excretion of uric acid with glycolysis now firmly implicated). Major urate loci are SLC2A9 and ABCG2. Recent studies show that SLC2A9 is involved in renal and gut excretion of uric acid and is implicated in antioxidant defense. Although etiological variants at SLC2A9 are yet to be identified, it is clear that considerable genetic complexity exists at the SLC2A9 locus, with multiple statistically independent genetic variants and local epistatic interactions. The positions of implicated genetic variants within or near chromatin regions involved in transcriptional control suggest that this mechanism (rather than structural changes in SLC2A9) is important in regulating the activity of SLC2A9. ABCG2 is involved primarily in extra-renal uric acid under-excretion with the etiological variant influencing expression. At the other 26 loci, probable causal genes can be identified at three (PDZK1, SLC22A11, and INHBB) with strong candidates at a further 10 loci. Confirmation of the causal gene will require a combination of re-sequencing, trans-ancestral mapping, and correlation of genetic association data with expression data. As expected, the urate loci associate with gout, although inconsistent effect sizes for gout require investigation. Finally, there has been no genome-wide association study using clinically ascertained cases to investigate the causes of gout in the presence of hyperuricemia. In such a study, use of asymptomatic hyperurcemic controls would be expected to increase the ability to detect genetic associations with gout.
Allele frequencies of variants in ultra conserved elements identify selective pressure on transcription factor binding.

PubMed

Silla, Toomas; Kepp, Katrin; Tai, E Shyong; Goh, Liang; Davila, Sonia; Catela Ivkovic, Tina; Calin, George A; Voorhoeve, P Mathijs

2014-01-01

Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.
Exploring metabolic pathways in genome-scale networks via generating flux modes.

PubMed

Rezola, A; de Figueiredo, L F; Brock, M; Pey, J; Podhorski, A; Wittmann, C; Schuster, S; Bockmayr, A; Planes, F J

2011-02-15

The reconstruction of metabolic networks at the genome scale has allowed the analysis of metabolic pathways at an unprecedented level of complexity. Elementary flux modes (EFMs) are an appropriate concept for such analysis. However, their number grows in a combinatorial fashion as the size of the metabolic network increases, which renders the application of EFMs approach to large metabolic networks difficult. Novel methods are expected to deal with such complexity. In this article, we present a novel optimization-based method for determining a minimal generating set of EFMs, i.e. a convex basis. We show that a subset of elements of this convex basis can be effectively computed even in large metabolic networks. Our method was applied to examine the structure of pathways producing lysine in Escherichia coli. We obtained a more varied and informative set of pathways in comparison with existing methods. In addition, an alternative pathway to produce lysine was identified using a detour via propionyl-CoA, which shows the predictive power of our novel approach. The source code in C++ is available upon request.
A decade after the first full human genome sequencing: when will we understand our own genome?

PubMed

Eisenhaber, Frank

2012-10-01

The contrast between the pomp of celebrating the first full human genome sequencing in 2000 and the cautious tone of recollections a decade thereafter could hardly be greater. The promises with regard to medical cures and biotechnology applications have been realized not even nearly to the expectations. Understanding the human genomes means knowing the genes' and proteins' functions and their interconnectedness via biomolecular mechanisms. This articles estimates how long will it take to achieve this goal if we extrapolate from the previous decade (indeed, a century!) and the possible disruptive trends in science, technology and society that may accelerate the pace of progress dramatically.
Informational laws of genome structures

PubMed Central

Bonnici, Vincenzo; Manca, Vincenzo

2016-01-01

In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155
Informational laws of genome structures

NASA Astrophysics Data System (ADS)

Bonnici, Vincenzo; Manca, Vincenzo

2016-06-01

In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.
Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits.

PubMed

Varshney, Rajeev K; Saxena, Rachit K; Upadhyaya, Hari D; Khan, Aamir W; Yu, Yue; Kim, Changhoon; Rathore, Abhishek; Kim, Dongseon; Kim, Jihun; An, Shaun; Kumar, Vinay; Anuradha, Ghanta; Yamini, Kalinati Narasimhan; Zhang, Wei; Muniswamy, Sonnappa; Kim, Jong-So; Penmetsa, R Varma; von Wettberg, Eric; Datta, Swapan K

2017-07-01

Pigeonpea (Cajanus cajan), a tropical grain legume with low input requirements, is expected to continue to have an important role in supplying food and nutritional security in developing countries in Asia, Africa and the tropical Americas. From whole-genome resequencing of 292 Cajanus accessions encompassing breeding lines, landraces and wild species, we characterize genome-wide variation. On the basis of a scan for selective sweeps, we find several genomic regions that were likely targets of domestication and breeding. Using genome-wide association analysis, we identify associations between several candidate genes and agronomically important traits. Candidate genes for these traits in pigeonpea have sequence similarity to genes functionally characterized in other plants for flowering time control, seed development and pod dehiscence. Our findings will allow acceleration of genetic gains for key traits to improve yield and sustainability in pigeonpea.
Genome Sequencing and Assembly by Long Reads in Plants

PubMed Central

Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

2017-01-01

Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Reassessment of the Genome Size in Elaeis guineensis and Elaeis oleifera, and Its Interspecific Hybrid

PubMed Central

Camillo, Julceia; Leão, André P; Alves, Alexandre A; Formighieri, Eduardo F; Azevedo, Ana LS; Nunes, Juliana D; de Capdeville, Guy; de A Mattos, Jean K; Souza, Manoel T

2014-01-01

Aiming at generating a comprehensive genomic database on Elaeis spp., our group is leading several R&D initiatives with Elaeis guineensis (African oil palm) and Elaeis oleifera (American oil palm), including the whole-genome sequencing of the last. Genome size estimates currently available for this genus are controversial, as they indicate that American oil palm genome is about half the size of the African oil palm genome and that the genome of the interspecific hybrid is bigger than both the parental species genomes. We estimated the genome size of three E. guineensis genotypes, five E. oleifera genotypes, and two interspecific hybrids genotypes. On average, the genome size of E. guineensis is 4.32 ± 0.173 pg, while that of E. oleifera is 4.43 ± 0.018 pg. This indicates that both genomes are similar in size, even though E. oleifera is in fact bigger. As expected, the hybrid genome size is around the average of the two genomes, 4.40 ± 0.016 pg. Additionally, we demonstrate that both species present around 38% of GC content. As our results contradict the currently available data on Elaeis spp. genome sizes, we propose that the actual genome size of the Elaeis species is around 4 pg and that American oil palm possesses a larger genome than African oil palm. PMID:26203259

DNA is structured as a linear "jigsaw puzzle" in the genomes of Arabidopsis, rice, and budding yeast.

PubMed

Liu, Yun-Hua; Zhang, Meiping; Wu, Chengcang; Huang, James J; Zhang, Hong-Bin

2014-01-01

Knowledge of how a genome is structured and organized from its constituent elements is crucial to understanding its biology and evolution. Here, we report the genome structuring and organization pattern as revealed by systems analysis of the sequences of three model species, Arabidopsis, rice and yeast, at the whole-genome and chromosome levels. We found that all fundamental function elements (FFE) constituting the genomes, including genes (GEN), DNA transposable elements (DTE), retrotransposable elements (RTE), simple sequence repeats (SSR), and (or) low complexity repeats (LCR), are structured in a nonrandom and correlative manner, thus leading to a hypothesis that the DNA of the species is structured as a linear "jigsaw puzzle". Furthermore, we showed that different FFE differ in their importance in the formation and evolution of the DNA jigsaw puzzle structure between species. DTE and RTE play more important roles than GEN, LCR, and SSR in Arabidopsis, whereas GEN and RTE play more important roles than LCR, SSR, and DTE in rice. The genes having multiple recognized functions play more important roles than those having single functions. These results provide useful knowledge necessary for better understanding genome biology and evolution of the species and for effective molecular breeding of rice.
Genome resilience and prevalence of segmental duplications following fast neutron irradiation of soybean

USDA-ARS?s Scientific Manuscript database

Fast neutron radiation has been used as a mutagen to develop extensive mutant collections. However, the genome-wide structural consequences of fast neutron radiation are not well understood. Here, we examine the genome-wide structural variants observed among 264 soybean (Glycine max (L.) Merrill) pl...
Comparative genetic mapping between clementine, pummelo and sweet orange and the interspecicic structure of the Clementine genome

USDA-ARS?s Scientific Manuscript database

Comparative genetic mapping between clementine, pummelo and sweet orange and the interspecicic structure of the Clementine genome The availability of a saturated genetic map of Clementine was identified by the International Citrus Genome Consortium as an essential prerequisite to assist the assembly...
Draft Genome Sequence of Telmatospirillum siberiense 26-4b1, an Acidotolerant Peatland Alphaproteobacterium Potentially Involved in Sulfur Cycling

PubMed Central

Schreck, Katharina; Herbold, Craig W.; Daims, Holger; Wagner, Michael; Loy, Alexander

2018-01-01

ABSTRACT The facultative anaerobic chemoorganoheterotrophic alphaproteobacterium Telmatospirillum siberiense 26-4b1 was isolated from a Siberian peatland. We report here a 6.20-Mbp near-complete high-quality draft genome sequence of T. siberiense that reveals expected and novel metabolic potential for the genus Telmatospirillum, including genes for sulfur oxidation. PMID:29371357
Complete Genome Sequence of a Putative New Bacterial Strain, I507, Isolated from the Indian Ocean

PubMed Central

Wang, Shu-yan; Wei, Jia-qiang

2018-01-01

ABSTRACT Bacterial strain I507 was isolated from the central Indian Ocean and may be a potential novel species, according to the 16S rRNA gene sequence. Here, we present its complete genome sequence and expect that it will provide researchers with valuable information to further understand its classification and function in the future. PMID:29674539
Coverage of whole proteome by structural genomics observed through protein homology modeling database

PubMed Central

Yamaguchi, Akihiro; Go, Mitiko

2006-01-01

We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (http://daisy.nagahama-i-bio.ac.jp/Famsbase/), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. PMID:17146617
Australians' views on personal genomic testing: focus group findings from the Genioz study.

PubMed

Metcalfe, Sylvia A; Hickerton, Chriselle; Savard, Jacqueline; Terrill, Bronwyn; Turbitt, Erin; Gaff, Clara; Gray, Kathleen; Middleton, Anna; Wilson, Brenda; Newson, Ainsley J

2018-04-30

Personal genomic testing provides healthy individuals with access to information about their genetic makeup for purposes including ancestry, paternity, sporting ability and health. Such tests are available commercially and globally, with accessibility expected to continue to grow, including in Australia; yet little is known of the views/expectations of Australians. Focus groups were conducted within a multi-stage, cross-disciplinary project (Genioz) to explore this. In mid-2015, 56 members of the public participated in seven focus groups, allocated into three age groups: 18-24, 25-49, and ≥50 years. Three researchers coded transcripts independently and generated themes. Awareness of personal genomic testing was low, but most could deduce what "personal genomics" might entail. Very few had heard of the term "direct-to-consumer" testing, which has implications for organisations developing information to support individuals in their decision-making. Participants' understanding of genetics was varied and drawn from several sources. There were diverse perceptions of the relative influence of genetics and environment on health, mental health, behavior, talent, or personality. Views about having a personal genomic test were mixed, with greater interest in health-related tests if they believed there was a reason for doing so. However, many expressed scepticisms about the types of tests available, and how the information might be used; concerns were also raised about privacy and the potential for discrimination. These exploratory findings inform subsequent stages of the Genioz study, thereby contributing to strategies of supporting Australians to understand and make meaningful and well-considered decisions about the benefits, harms, and implications of personal genomic tests.
Pathgroups, a dynamic data structure for genome reconstruction problems.

PubMed

Zheng, Chunfang

2010-07-01

Ancestral gene order reconstruction problems, including the median problem, quartet construction, small phylogeny, guided genome halving and genome aliquoting, are NP hard. Available heuristics dedicated to each of these problems are computationally costly for even small instances. We present a data structure enabling rapid heuristic solution to all these ancestral genome reconstruction problems. A generic greedy algorithm with look-ahead based on an automatically generated priority system suffices for all the problems using this data structure. The efficiency of the algorithm is due to fast updating of the structure during run time and to the simplicity of the priority scheme. We illustrate with the first rapid algorithm for quartet construction and apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny. http://albuquerque.bioinformatics.uottawa.ca/pathgroup/Quartet.html chunfang313@gmail.com Supplementary data are available at Bioinformatics online.
The contribution of co-transcriptional RNA:DNA hybrid structures to DNA damage and genome instability

PubMed Central

Hamperl, Stephan; Cimprich, Karlene A.

2014-01-01

Accurate DNA replication and DNA repair are crucial for the maintenance of genome stability, and it is generally accepted that failure of these processes is a major source of DNA damage in cells. Intriguingly, recent evidence suggests that DNA damage is more likely to occur at genomic loci with high transcriptional activity. Furthermore, loss of certain RNA processing factors in eukaryotic cells is associated with increased formation of co-transcriptional RNA:DNA hybrid structures known as R-loops, resulting in double-strand breaks (DSBs) and DNA damage. However, the molecular mechanisms by which R-loop structures ultimately lead to DNA breaks and genome instability is not well understood. In this review, we summarize the current knowledge about the formation, recognition and processing of RNA:DNA hybrids, and discuss possible mechanisms by which these structures contribute to DNA damage and genome instability in the cell. PMID:24746923
Evaluation of Genetic Diversity, Population Structure, and Relationship Between Legendary Vechur Cattle and Crossbred Cattle of Kerala State, India.

PubMed

Radhika, G; Aravindakshan, T V; Jinty, S; Ramya, K

2018-01-02

The legendary Vechur cattle of Kerala, described as a very short breed, and the crossbred (CB) Sunandini cattle population exhibited great phenotypic variation; hence, the present study attempted to analyze the genetic diversity existing between them. A set of 14 polymorphic microsatellites were chosen from FAO-ISAG panel and amplified from genomic DNA isolated from blood samples of 30 Vechur and 64 unrelated crossbred cattle, using fluorescent labeled primers. Both populations revealed high genetic diversity as evidenced from high observed number of alleles, Polymorphic Information Content and expected heterozygosity. Observed heterozygosity was lesser (0.699) than expected (0.752) in Vechur population which was further supported by positive F IS value of 0.1149, indicating slight level of inbreeding in Vechur population. Overall, F ST value was 0.065, which means genetic differentiation between crossbred and Vechur population was 6.5%, indicating that the crossbred cattle must have differentiated into a definite population that is different from the indigenous Vechur cows. Structure analysis indicated that the two populations showed distinct differences, with two underlying clusters. The present study supports the separation between Taurine and Zebu cattle and throws light onto the genetic diversity and relationship between native Vechur and crossbred cattle populations in Kerala state.
Nutrigenomics in the modern era.

PubMed

Mathers, John C

2017-08-01

The concept that interactions between nutrition and genetics determine phenotype was established by Garrod at the beginning of the 20th century through his ground-breaking work on inborn errors of metabolism. A century later, the science and technologies involved in sequencing of the human genome stimulated development of the scientific discipline which we now recognise as nutritional genomics (nutrigenomics). Much of the early hype around possible applications of this new science was unhelpful and raised expectations, which have not been realised as quickly as some would have hoped. However, major advances have been made in quantifying the contribution of genetic variation to a wide range of phenotypes and it is now clear that for nutrition-related phenotypes, such as obesity and common complex diseases, the genetic contribution made by SNP alone is often modest. There is much scope for innovative research to understand the roles of less well explored types of genomic structural variation, e.g. copy number variants, and of interactions between genotype and dietary factors, in phenotype determination. New tools and models, including stem cell-based approaches and genome editing, have huge potential to transform mechanistic nutrition research. Finally, the application of nutrigenomics research offers substantial potential to improve public health e.g. through the use of metabolomics approaches to identify novel biomarkers of food intake, which will lead to more objective and robust measures of dietary exposure. In addition, nutrigenomics may have applications in the development of personalised nutrition interventions, which may facilitate larger, more appropriate and sustained changes in eating (and other lifestyle) behaviours and help to reduce health inequalities.
Effectiveness of Genomic Prediction of Maize Hybrid Performance in Different Breeding Populations and Environments

PubMed Central

Windhausen, Vanessa S.; Atlin, Gary N.; Hickey, John M.; Crossa, Jose; Jannink, Jean-Luc; Sorrells, Mark E.; Raman, Babu; Cairns, Jill E.; Tarekegne, Amsal; Semagn, Kassa; Beyene, Yoseph; Grudloyma, Pichet; Technow, Frank; Riedelsheimer, Christian; Melchinger, Albrecht E.

2012-01-01

Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F2-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F2-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set. PMID:23173094
Cascade of chromosomal rearrangements caused by a heterogeneous T-DNA integration supports the double-stranded break repair model for T-DNA integration.

PubMed

Hu, Yufei; Chen, Zhiyu; Zhuang, Chuxiong; Huang, Jilei

2017-06-01

Transferred DNA (T-DNA) from Agrobacterium tumefaciens can be integrated into the plant genome. The double-stranded break repair (DSBR) pathway is a major model for T-DNA integration. From this model, we expect that two ends of a T-DNA molecule would invade into a single DNA double-stranded break (DSB) or independent DSBs in the plant genome. We call the later phenomenon a heterogeneous T-DNA integration, which has never been observed. In this work, we demonstrated it in an Arabidopsis T-DNA insertion mutant seb19. To resolve the chromosomal structural changes caused by T-DNA integration at both the nucleotide and chromosome levels, we performed inverse PCR, genome resequencing, fluorescence in situ hybridization and linkage analysis. We found, in seb19, a single T-DNA connected two different chromosomal loci and caused complex chromosomal rearrangements. The specific break-junction pattern in seb19 is consistent with the result of heterogeneous T-DNA integration but not of recombination between two T-DNA insertions. We demonstrated that, in seb19, heterogeneous T-DNA integration evoked a cascade of incorrect repair of seven DSBs on chromosomes 4 and 5, and then produced translocation, inversion, duplication and deletion. Heterogeneous T-DNA integration supports the DSBR model and suggests that two ends of a T-DNA molecule could be integrated into the plant genome independently. Our results also show a new origin of chromosomal abnormalities. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples

PubMed Central

Peterson, Thomas A.; Park, Junyong

2017-01-01

The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are ‘gene-centric’ in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new ‘domain-centric’ method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots’ unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods. PMID:28426665
Protein family clustering for structural genomics.

PubMed

Yan, Yongpan; Moult, John

2005-10-28

A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

PubMed

Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

2015-02-10

Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

PubMed

Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

2016-02-27

In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.
Chemical biology on the genome.

PubMed

Balasubramanian, Shankar

2014-08-15

In this article I discuss studies towards understanding the structure and function of DNA in the context of genomes from the perspective of a chemist. The first area I describe concerns the studies that led to the invention and subsequent development of a method for sequencing DNA on a genome scale at high speed and low cost, now known as Solexa/Illumina sequencing. The second theme will feature the four-stranded DNA structure known as a G-quadruplex with a focus on its fundamental properties, its presence in cellular genomic DNA and the prospects for targeting such a structure in cels with small molecules. The final topic for discussion is naturally occurring chemically modified DNA bases with an emphasis on chemistry for decoding (or sequencing) such modifications in genomic DNA. The genome is a fruitful topic to be further elucidated by the creation and application of chemical approaches. Copyright © 2014 Elsevier Ltd. All rights reserved.
A sequence-based survey of the complex structural organization of tumor genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav

2008-04-03

The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison ofmore » the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.« less
[Landscape and ecological genomics].

PubMed

Tetushkin, E Ia

2013-10-01

Landscape genomics is the modern version of landscape genetics, a discipline that arose approximately 10 years ago as a combination of population genetics, landscape ecology, and spatial statistics. It studies the effects of environmental variables on gene flow and other microevolutionary processes that determine genetic connectivity and variations in populations. In contrast to population genetics, it operates at the level of individual specimens rather than at the level of population samples. Another important difference between landscape genetics and genomics and population genetics is that, in the former, the analysis of gene flow and local adaptations takes quantitative account of landforms and features of the matrix, i.e., hostile spaces that separate species habitats. Landscape genomics is a part of population ecogenomics, which, along with community genomics, is a major part of ecological genomics. One of the principal purposes of landscape genomics is the identification and differentiation of various genome-wide and locus-specific effects. The approaches and computation tools developed for combined analysis of genomic and landscape variables make it possible to detect adaptation-related genome fragments, which facilitates the planning of conservation efforts and the prediction of species' fate in response to expected changes in the environment.

Effect of genomics-related literacy on non-communicable diseases.

PubMed

Nakamura, Sho; Narimatsu, Hiroto; Katayama, Kayoko; Sho, Ri; Yoshioka, Takashi; Fukao, Akira; Kayama, Takamasa

2017-09-01

Recent progress in genomic research has raised expectations for the development of personalized preventive medicine, although genomics-related literacy of patients will be essential. Thus, enhancing genomics-related literacy is crucial, particularly for individuals with low genomics-related literacy because they might otherwise miss the opportunity to receive personalized preventive care. This should be especially emphasized when a lack of genomics-related literacy is associated with elevated disease risk, because patients could therefore be deprived of the added benefits of preventive interventions; however, whether such an association exists is unclear. Association between genomics-related literacy, calculated as the genomics literacy score (GLS), and the prevalence of non-communicable diseases was assessed using propensity score matching on 4646 participants (males: 1891; 40.7%). Notably, the low-GLS group (score below median) presented a higher risk of hypertension (relative risk (RR) 1.09, 95% confidence interval (CI) 1.03-1.16) and obesity (RR 1.11, 95% CI 1.01-1.22) than the high-GLS group. Our results suggest that a low level of genomics-related literacy could represent a risk factor for hypertension and obesity. Evaluating genomics-related literacy could be used to identify a more appropriate population for health and educational interventions.
MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

PubMed Central

Spannagl, Manuel; Noubibou, Octave; Haase, Dirk; Yang, Li; Gundlach, Heidrun; Hindemitt, Tobias; Klee, Kathrin; Haberer, Georg; Schoof, Heiko; Mayer, Klaus F. X.

2007-01-01

Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at . PMID:17202173
A first-principles model of early evolution: emergence of gene families, species, and preferred protein folds.

PubMed

Zeldovich, Konstantin B; Chen, Peiqiu; Shakhnovich, Boris E; Shakhnovich, Eugene I

2007-07-01

In this work we develop a microscopic physical model of early evolution where phenotype--organism life expectancy--is directly related to genotype--the stability of its proteins in their native conformations-which can be determined exactly in the model. Simulating the model on a computer, we consistently observe the "Big Bang" scenario whereby exponential population growth ensues as soon as favorable sequence-structure combinations (precursors of stable proteins) are discovered. Upon that, random diversity of the structural space abruptly collapses into a small set of preferred proteins. We observe that protein folds remain stable and abundant in the population at timescales much greater than mutation or organism lifetime, and the distribution of the lifetimes of dominant folds in a population approximately follows a power law. The separation of evolutionary timescales between discovery of new folds and generation of new sequences gives rise to emergence of protein families and superfamilies whose sizes are power-law distributed, closely matching the same distributions for real proteins. On the population level we observe emergence of species--subpopulations that carry similar genomes. Further, we present a simple theory that relates stability of evolving proteins to the sizes of emerging genomes. Together, these results provide a microscopic first-principles picture of how first-gene families developed in the course of early evolution.
Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics.

PubMed

Kawaguchi, Atsushi; Yamashita, Fumio

2017-10-01

This article proposes a procedure for describing the relationship between high-dimensional data sets, such as multimodal brain images and genetic data. We propose a supervised technique to incorporate the clinical outcome to determine a score, which is a linear combination of variables with hieratical structures to multimodalities. This approach is expected to obtain interpretable and predictive scores. The proposed method was applied to a study of Alzheimer's disease (AD). We propose a diagnostic method for AD that involves using whole-brain magnetic resonance imaging (MRI) and positron emission tomography (PET), and we select effective brain regions for the diagnostic probability and investigate the genome-wide association with the regions using single nucleotide polymorphisms (SNPs). The two-step dimension reduction method, which we previously introduced, was considered applicable to such a study and allows us to partially incorporate the proposed method. We show that the proposed method offers classification functions with feasibility and reasonable prediction accuracy based on the receiver operating characteristic (ROC) analysis and reasonable regions of the brain and genomes. Our simulation study based on the synthetic structured data set showed that the proposed method outperformed the original method and provided the characteristic for the supervised feature. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

PubMed Central

Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

PubMed

Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

2012-01-01

Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Functional RNA structures throughout the Hepatitis C Virus genome.

PubMed

Adams, Rebecca L; Pirakitikulr, Nathan; Pyle, Anna Marie

2017-06-01

The single-stranded Hepatitis C Virus (HCV) genome adopts a set of elaborate RNA structures that are involved in every stage of the viral lifecycle. Recent advances in chemical probing, sequencing, and structural biology have facilitated analysis of RNA folding on a genome-wide scale, revealing novel structures and networks of interactions. These studies have underscored the active role played by RNA in every function of HCV and they open the door to new types of RNA-targeted therapeutics. Copyright © 2017 Elsevier B.V. All rights reserved.
Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

PubMed Central

2012-01-01

Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411
Apophysomyces variabilis: draft genome sequence and comparison of predictive virulence determinants with other medically important Mucorales.

PubMed

Prakash, Hariprasath; Rudramurthy, Shivaprakash Mandya; Gandham, Prasad S; Ghosh, Anup Kumar; Kumar, Milner M; Badapanda, Chandan; Chakrabarti, Arunaloke

2017-09-18

Apophysomyces species are prevalent in tropical countries and A. variabilis is the second most frequent agent causing mucormycosis in India. Among Apophysomyces species, A. elegans, A. trapeziformis and A. variabilis are commonly incriminated in human infections. The genome sequences of A. elegans and A. trapeziformis are available in public database, but not A. variabilis. We, therefore, performed the whole genome sequence of A. variabilis to explore its genomic structure and possible genes determining the virulence of the organism. The whole genome of A. variabilis NCCPF 102052 was sequenced and the genomic structure of A. variabilis was compared with already available genome structures of A. elegans, A. trapeziformis and other medically important Mucorales. The total size of genome assembly of A. variabilis was 39.38 Mb with 12,764 protein-coding genes. The transposable elements (TEs) were low in Apophysomyces genome and the retrotransposon Ty3-gypsy was the common TE. Phylogenetically, Apophysomyces species were grouped closely with Phycomyces blakesleeanus. OrthoMCL analysis revealed 3025 orthologues proteins, which were common in those three pathogenic Apophysomyces species. Expansion of multiple gene families/duplication was observed in Apophysomyces genomes. Approximately 6% of Apophysomyces genes were predicted to be associated with virulence on PHIbase analysis. The virulence determinants included the protein families of CotH proteins (invasins), proteases, iron utilisation pathways, siderophores and signal transduction pathways. Serine proteases were the major group of proteases found in all Apophysomyces genomes. The carbohydrate active enzymes (CAZymes) constitute the majority of the secretory proteins. The present study is the maiden attempt to sequence and analyze the genomic structure of A. variabilis. Together with available genome sequence of A. elegans and A. trapeziformis, the study helped to indicate the possible virulence determinants of pathogenic Apophysomyces species. The presence of unique CAZymes in cell wall might be exploited in future for antifungal drug development.
Proteomics in the genome engineering era.

PubMed

Vandemoortele, Giel; Gevaert, Kris; Eyckerman, Sven

2016-01-01

Genome engineering experiments used to be lengthy, inefficient, and often expensive, preventing a widespread adoption of such experiments for the full assessment of endogenous protein functions. With the revolutionary clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 technology, genome engineering became accessible to the broad life sciences community and is now implemented in several research areas. One particular field that can benefit significantly from this evolution is proteomics where a substantial impact on experimental design and general proteome biology can be expected. In this review, we describe the main applications of genome engineering in proteomics, including the use of engineered disease models and endogenous epitope tagging. In addition, we provide an overview on current literature and highlight important considerations when launching genome engineering technologies in proteomics workflows. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Reduced Mutation Rate and Increased Transformability of Transposon-Free Acinetobacter baylyi ADP1-ISx

PubMed Central

Suárez, Gabriel A.; Renda, Brian A.; Dasgupta, Aurko

2017-01-01

ABSTRACT The genomes of most bacteria contain mobile DNA elements that can contribute to undesirable genetic instability in engineered cells. In particular, transposable insertion sequence (IS) elements can rapidly inactivate genes that are important for a designed function. We deleted all six copies of IS1236 from the genome of the naturally transformable bacterium Acinetobacter baylyi ADP1. The natural competence of ADP1 made it possible to rapidly repair deleterious point mutations that arose during strain construction. In the resulting ADP1-ISx strain, the rates of mutations inactivating a reporter gene were reduced by 7- to 21-fold. This reduction was higher than expected from the incidence of new IS1236 insertions found during a 300-day mutation accumulation experiment with wild-type ADP1 that was used to estimate spontaneous mutation rates in the strain. The extra improvement appears to be due in part to eliminating large deletions caused by IS1236 activity, as the point mutation rate was unchanged in ADP1-ISx. Deletion of an error-prone polymerase (dinP) and a DNA damage response regulator (umuDAb [the umuD gene of A. baylyi]) from the ADP1-ISx genome did not further reduce mutation rates. Surprisingly, ADP1-ISx exhibited increased transformability. This improvement may be due to less autolysis and aggregation of the engineered cells than of the wild type. Thus, deleting IS elements from the ADP1 genome led to a greater than expected increase in evolutionary reliability and unexpectedly enhanced other key strain properties, as has been observed for other clean-genome bacterial strains. ADP1-ISx is an improved chassis for metabolic engineering and other applications. IMPORTANCE Acinetobacter baylyi ADP1 has been proposed as a next-generation bacterial host for synthetic biology and genome engineering due to its ability to efficiently take up DNA from its environment during normal growth. We deleted transposable elements that are capable of copying themselves, inserting into other genes, and thereby inactivating them from the ADP1 genome. The resulting “clean-genome” ADP1-ISx strain exhibited larger reductions in the rates of inactivating mutations than expected from spontaneous mutation rates measured via whole-genome sequencing of lineages evolved under relaxed selection. Surprisingly, we also found that IS element activity reduces transformability and is a major cause of cell aggregation and death in wild-type ADP1 grown under normal laboratory conditions. More generally, our results demonstrate that domesticating a bacterial genome by removing mobile DNA elements that have accumulated during evolution in the wild can have unanticipated benefits. PMID:28667117
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

PubMed

Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

2016-01-01

Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances

PubMed Central

Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

2016-01-01

Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos). PMID:27846272
Draft Genome Sequence of Telmatospirillum siberiense 26-4b1, an Acidotolerant Peatland Alphaproteobacterium Potentially Involved in Sulfur Cycling.

PubMed

Hausmann, Bela; Pjevac, Petra; Schreck, Katharina; Herbold, Craig W; Daims, Holger; Wagner, Michael; Loy, Alexander

2018-01-25

The facultative anaerobic chemoorganoheterotrophic alphaproteobacterium Telmatospirillum siberiense 26-4b1 was isolated from a Siberian peatland. We report here a 6.20-Mbp near-complete high-quality draft genome sequence of T. siberiense that reveals expected and novel metabolic potential for the genus Telmatospirillum , including genes for sulfur oxidation. Copyright © 2018 Hausmann et al.
Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

PubMed

Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

2017-09-01

Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Statistical Significance of Optical Map Alignments

PubMed Central

Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.

2012-01-01

Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568
Rose spring dwarf-associated virus has RNA structural and gene-expression features like those of Barley yellow dwarf virus

PubMed Central

Salem, Nida’ M.; Miller, W. Allen; Rowhani, Adib; Golino, Deborah A.; Moyne, Anne-Laure; Falk, Bryce W.

2015-01-01

We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5′- and 3′-RACE showed the RSDaV genomic RNA to be 5,808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3′-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5′ ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5′ end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3′ cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae. PMID:18329064
Rose spring dwarf-associated virus has RNA structural and gene-expression features like those of Barley yellow dwarf virus.

PubMed

Salem, Nida' M; Miller, W Allen; Rowhani, Adib; Golino, Deborah A; Moyne, Anne-Laure; Falk, Bryce W

2008-06-05

We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5'- and 3'-RACE showed the RSDaV genomic RNA to be 5808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3'-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5' ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5' end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3' cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae.
Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm

PubMed Central

Martín H., José Antonio

2013-01-01

Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways. In almost all of these problems, there is the need for proving a hypothesis about certain property of an object that can be present if and only if it adopts some particular admissible structure (an NP-certificate) or be absent (no admissible structure), however, none of the standard approaches can discard the hypothesis when no solution can be found, since none can provide a proof that there is no admissible structure. This article presents an algorithm that introduces a novel type of solution method to “efficiently” solve the graph 3-coloring problem; an NP-complete problem. The proposed method provides certificates (proofs) in both cases: present or absent, so it is possible to accept or reject the hypothesis on the basis of a rigorous proof. It provides exact solutions and is polynomial-time (i.e., efficient) however parametric. The only requirement is sufficient computational power, which is controlled by the parameter . Nevertheless, here it is proved that the probability of requiring a value of to obtain a solution for a random graph decreases exponentially: , making tractable almost all problem instances. Thorough experimental analyses were performed. The algorithm was tested on random graphs, planar graphs and 4-regular planar graphs. The obtained experimental results are in accordance with the theoretical expected results. PMID:23349711
Protein domain organisation: adding order.

PubMed

Kummerfeld, Sarah K; Teichmann, Sarah A

2009-01-29

Domains are the building blocks of proteins. During evolution, they have been duplicated, fused and recombined, to produce proteins with novel structures and functions. Structural and genome-scale studies have shown that pairs or groups of domains observed together in a protein are almost always found in only one N to C terminal order and are the result of a single recombination event that has been propagated by duplication of the multi-domain unit. Previous studies of domain organisation have used graph theory to represent the co-occurrence of domains within proteins. We build on this approach by adding directionality to the graphs and connecting nodes based on their relative order in the protein. Most of the time, the linear order of domains is conserved. However, using the directed graph representation we have identified non-linear features of domain organization that are over-represented in genomes. Recognising these patterns and unravelling how they have arisen may allow us to understand the functional relationships between domains and understand how the protein repertoire has evolved. We identify groups of domains that are not linearly conserved, but instead have been shuffled during evolution so that they occur in multiple different orders. We consider 192 genomes across all three kingdoms of life and use domain and protein annotation to understand their functional significance. To identify these features and assess their statistical significance, we represent the linear order of domains in proteins as a directed graph and apply graph theoretical methods. We describe two higher-order patterns of domain organisation: clusters and bi-directionally associated domain pairs and explore their functional importance and phylogenetic conservation. Taking into account the order of domains, we have derived a novel picture of global protein organization. We found that all genomes have a higher than expected degree of clustering and more domain pairs in forward and reverse orientation in different proteins relative to random graphs with identical degree distributions. While these features were statistically over-represented, they are still fairly rare. Looking in detail at the proteins involved, we found strong functional relationships within each cluster. In addition, the domains tended to be involved in protein-protein interaction and are able to function as independent structural units. A particularly striking example was the human Jak-STAT signalling pathway which makes use of a set of domains in a range of orders and orientations to provide nuanced signaling functionality. This illustrated the importance of functional and structural constraints (or lack thereof) on domain organisation.

The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lang, Daniel; Ullrich, Kristian K.; Murat, Florent

Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less
The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution

DOE PAGES

Lang, Daniel; Ullrich, Kristian K.; Murat, Florent; ...

2017-12-13

Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less
Behind Every Good Metabolite there is a Great Enzyme (and perhaps a structure)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buchko, Garry W.; Phan, Isabelle; Cron, Lisabeth

Today, due to great technological advancements, it is possible to study everything at the same time. This ability has given birth to “totality” studies in the fields of genomics, transcriptomics, proteomics, and metabolomics. In turn, the combined study of all these global analyses gave birth to the field of systems biology. Another “totality” field brought to life with new emerging technologies is structural genomics, an effort to determine the three-dimensional structure of every protein encoded in a genome. The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a specialized structural genomics effort composed of academic (University of Washington), governmentmore » (Pacific Northwest National Laboratory), not-for-profit (Seattle BioMed), and commercial (Emerald BioStructures) institutions that is funded by the National Institute of Allergy and Infectious Diseases (Federal Contract: HHSN272200700057C and HHSN27220120025C) to apply genome-scale approaches in solving protein structures from biodefense organisms, as well as those causing emerging and re-emerging disease. In five years over 540 structures have been deposited into the Protein Data Bank (PDB) by SSGICD. About one third of all SSGCID structures contain bound ligands, many of which are metabolites or metabolite analogues present in the cell. These proteins structures are the blueprints for the structure-based design of the next generation of drugs against bacterial pathogens and other infectious diseases. Many of the selected SSGCID targets are annotated enzymes from known metabolomic pathways essential to cellular vitality since selectively “knocking-out” one of the enzymes in an important pathway with a drug may be fatal to the organism. One reason metabolomic pathways are important is because of the small molecules, or metabolites, produced at various steps in these pathways and identified by metabolomic studies. Unlike genomics, transcriptomics, and proteomics that may be influenced by epigenetic, post-transcriptional, and post-translational modifications, respectively, the metabolites present in the cell at any one time represent downstream biochemical endproducts, and therefore, metabolite profiles may be most closely associated with a phenotype and provide valuable information for infectious disease research. Metabolomic data would be even more useful if it could be linked to the vast amount of structural genomics data. Towards this goal SSGCID has created an automated website (http://apps.sbri.org/SSGCIDTargetStatus/Pathway) that assigns selected SSGCID target proteins to MetaCyc pathways (http://metacyc.org/). Details of this website will be provided here. The SSGCID-Pathway website represents a first big step towards linking metabolites and metabolic pathways to structural genomic data with the goal of accelerating the discovery of new agents to battle infectious diseases.« less
Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

PubMed

Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

2005-01-01

Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.
Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

PubMed

Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

2014-11-29

Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
LINE-1 Elements in Structural Variation and Disease

PubMed Central

Beck, Christine R.; Garcia-Perez, José Luis; Badge, Richard M.; Moran, John V.

2014-01-01

The completion of the human genome reference sequence ushered in a new era for the study and discovery of human transposable elements. It now is undeniable that transposable elements, historically dismissed as junk DNA, have had an instrumental role in sculpting the structure and function of our genomes. In particular, long interspersed element-1 (LINE-1 or L1) and short interspersed elements (SINEs) continue to affect our genome, and their movement can lead to sporadic cases of disease. Here, we briefly review the types of transposable elements present in the human genome and their mechanisms of mobility. We next highlight how advances in DNA sequencing and genomic technologies have enabled the discovery of novel retrotransposons in individual genomes. Finally, we discuss how L1-mediated retrotransposition events impact human genomes. PMID:21801021
Genome-wide diversity and selective pressure in the human rhinovirus

PubMed Central

Kistler, Amy L; Webster, Dale R; Rouskin, Silvi; Magrini, Vince; Credle, Joel J; Schnurr, David P; Boushey, Homer A; Mardis, Elaine R; Li, Hao; DeRisi, Joseph L

2007-01-01

Background The human rhinoviruses (HRV) are one of the most common and diverse respiratory pathogens of humans. Over 100 distinct HRV serotypes are known, yet only 6 genomes are available. Due to the paucity of HRV genome sequence, little is known about the genetic diversity within HRV or the forces driving this diversity. Previous comparative genome sequence analyses indicate that recombination drives diversification in multiple genera of the picornavirus family, yet it remains unclear if this holds for HRV. Results To resolve this and gain insight into the forces driving diversification in HRV, we generated a representative set of 34 fully sequenced HRVs. Analysis of these genomes shows consistent phylogenies across the genome, conserved non-coding elements, and only limited recombination. However, spikes of genetic diversity at both the nucleotide and amino acid level are detectable within every locus of the genome. Despite this, the HRV genome as a whole is under purifying selective pressure, with islands of diversifying pressure in the VP1, VP2, and VP3 structural genes and two non-structural genes, the 3C protease and 3D polymerase. Mapping diversifying residues in these factors onto available 3-dimensional structures revealed the diversifying capsid residues partition to the external surface of the viral particle in statistically significant proximity to antigenic sites. Diversifying pressure in the pleconaril binding site is confined to a single residue known to confer drug resistance (VP1 191). In contrast, diversifying pressure in the non-structural genes is less clear, mapping both nearby and beyond characterized functional domains of these factors. Conclusion This work provides a foundation for understanding HRV genetic diversity and insight into the underlying biology driving evolution in HRV. It expands our knowledge of the genome sequence space that HRV reference serotypes occupy and how the pattern of genetic diversity across HRV genomes differs from other picornaviruses. It also reveals evidence of diversifying selective pressure in both structural genes known to interact with the host immune system and in domains of unassigned function in the non-structural 3C and 3D genes, raising the possibility that diversification of undiscovered functions in these essential factors may influence HRV fitness and evolution. PMID:17477878
Perspective: Role of structure prediction in materials discovery and design

NASA Astrophysics Data System (ADS)

Needs, Richard J.; Pickard, Chris J.

2016-05-01

Materials informatics owes much to bioinformatics and the Materials Genome Initiative has been inspired by the Human Genome Project. But there is more to bioinformatics than genomes, and the same is true for materials informatics. Here we describe the rapidly expanding role of searching for structures of materials using first-principles electronic-structure methods. Structure searching has played an important part in unraveling structures of dense hydrogen and in identifying the record-high-temperature superconducting component in hydrogen sulfide at high pressures. We suggest that first-principles structure searching has already demonstrated its ability to determine structures of a wide range of materials and that it will play a central and increasing part in materials discovery and design.
The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level.

PubMed

Rodriguez-R, Luis M; Gunturu, Santosh; Harvey, William T; Rosselló-Mora, Ramon; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T

2018-06-14

The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.
Phylogenetic analysis and expression profiling of the pattern recognition receptors: Insights into molecular recognition of invading pathogens in Manduca sexta.

PubMed

Zhang, Xiufeng; He, Yan; Cao, Xiaolong; Gunaratna, Ramesh T; Chen, Yun-ru; Blissard, Gary; Kanost, Michael R; Jiang, Haobo

2015-07-01

Pattern recognition receptors (PRRs) detect microbial pathogens and trigger innate immune responses. Previous biochemical studies have elucidated the physiological functions of eleven PRRs in Manduca sexta but our understanding of the recognition process is still limited, lacking genomic perspectives. While 34 C-type lectin-domain proteins and 16 Toll-like receptors are reported in the companion papers, we present here 120 other putative PRRs identified through the genome annotation. These include 76 leucine-rich repeat (LRR) proteins, 14 peptidoglycan recognition proteins, 6 EGF/Nim-domain proteins, 5 β-1,3-glucanase-related proteins, 4 galectins, 4 fibrinogen-related proteins, 3 thioester proteins, 5 immunoglobulin-domain proteins, 2 hemocytins, and 1 Reeler. Sequence alignment and phylogenetic analysis reveal the evolution history of a diverse repertoire of proteins for pathogen recognition. While functions of insect LRR proteins are mostly unknown, their structure diversification is phenomenal: In addition to the Toll homologs, 22 LRR proteins with a signal peptide are expected to be secreted; 18 LRR proteins lacking signal peptides may be cytoplasmic; 36 LRRs with a signal peptide and a transmembrane segment may be non-Toll receptors on the surface of cells. Expression profiles of the 120 genes in 52 tissue samples reflect complex regulation in various developmental stages and physiological states, including some likely by Rel family transcription factors via κB motifs in the promoter regions. This collection of information is expected to facilitate future biochemical studies detailing their respective roles in this model insect. Copyright © 2015 Elsevier Ltd. All rights reserved.
Phylogenetic analysis and expression profiling of the pattern recognition receptors: insights into molecular recognition of invading pathogens in Manduca sexta

PubMed Central

Zhang, Xiufeng; He, Yan; Cao, Xiaolong; Gunaratna, Ramesh T.; Chen, Yun-ru; Blissard, Gary; Kanost, Michael R.; Jiang, Haobo

2015-01-01

Pattern recognition receptors (PRRs) detect microbial pathogens and trigger innate immune responses. Previous biochemical studies have elucidated the physiological functions of eleven PRRs in Manduca sexta but our understanding of the recognition process is still limited, lacking genomic perspectives. While 34 C-type lectin-domain proteins and 16 Toll-like receptors are reported in the companion papers, we present here 120 other putative PRRs identified through the genome annotation. These include 76 leucine-rich repeat (LRR) proteins, 14 peptidoglycan recognition proteins, 6 EGF/Nim-domain proteins, 5 β-1,3-glucanase-related proteins, 4 galectins, 4 fibrinogen-related proteins, 3 thioester proteins, 5 immunoglobulin-domain proteins, 2 hemocytins, and 1 Reeler. Sequence alignment and phylogenetic analysis reveal the evolution history of a diverse repertoire of proteins for pathogen recognition. While functions of insect LRR proteins are mostly unknown, their structure diversification is phenomenal: In addition to the Toll homologs, 22 LRR proteins with a signal peptide are expected to be secreted; 18 LRR proteins lacking signal peptides may be cytoplasmic; 36 LRRs with a signal peptide and a transmembrane segment may be non-Toll receptors on the surface of cells. Expression profiles of the 120 genes in 52 tissue samples reflect complex regulation in various developmental stages and physiological states, including some likely by Rel family transcription factors via κB motifs in the promoter regions. This collection of information is expected to facilitate future biochemical studies detailing their respective roles in this model insect. PMID:25701384
From genomics to chemical genomics: new developments in KEGG

PubMed Central

Kanehisa, Minoru; Goto, Susumu; Hattori, Masahiro; Aoki-Kinoshita, Kiyoko F.; Itoh, Masumi; Kawashima, Shuichi; Katayama, Toshiaki; Araki, Michihiro; Hirakawa, Mika

2006-01-01

The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource () provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps. PMID:16381885
Chromosomal distribution of microsatellite repeats in Amazon cichlids genome (Pisces, Cichlidae)

PubMed Central

Schneider, Carlos Henrique; Gross, Maria Claudia; Terencio, Maria Leandra; de Tavares, Édika Sabrina Girão Mitozo; Martins, Cesar; Feldberg, Eliana

2015-01-01

Abstract Fish of the family Cichlidae are recognized as an excellent model for evolutionary studies because of their morphological and behavioral adaptations to a wide diversity of explored ecological niches. In addition, the family has a dynamic genome with variable structure, composition and karyotype organization. Microsatellites represent the most dynamic genomic component and a better understanding of their organization may help clarify the role of repetitive DNA elements in the mechanisms of chromosomal evolution. Thus, in this study, microsatellite sequences were mapped in the chromosomes of Cichla monoculus Agassiz, 1831, Pterophyllum scalare Schultze, 1823, and Symphysodon discus Heckel, 1840. Four microsatellites demonstrated positive results in the genome of Cichla monoculus and Symphysodon discus, and five demonstrated positive results in the genome of Pterophyllum scalare. In most cases, the microsatellite was dispersed in the chromosome with conspicuous markings in the centromeric or telomeric regions, which suggests that sequences contribute to chromosome structure and may have played a role in the evolution of this fish family. The comparative genome mapping data presented here provide novel information on the structure and organization of the repetitive DNA region of the cichlid genome and contribute to a better understanding of this fish family’s genome. PMID:26753076
Computational characterization of chromatin domain boundary-associated genomic elements

PubMed Central

Hong, Seungpyo

2017-01-01

Abstract Topologically associated domains (TADs) are 3D genomic structures with high internal interactions that play important roles in genome compaction and gene regulation. Their genomic locations and their association with CCCTC-binding factor (CTCF)-binding sites and transcription start sites (TSSs) were recently reported. However, the relationship between TADs and other genomic elements has not been systematically evaluated. This was addressed in the present study, with a focus on the enrichment of these genomic elements and their ability to predict the TAD boundary region. We found that consensus CTCF-binding sites were strongly associated with TAD boundaries as well as with the transcription factors (TFs) Zinc finger protein (ZNF)143 and Yin Yang (YY)1. TAD boundary-associated genomic elements include DNase I-hypersensitive sites, H3K36 trimethylation, TSSs, RNA polymerase II, and TFs such as Specificity protein 1, ZNF274 and SIX homeobox 5. Computational modeling with these genomic elements suggests that they have distinct roles in TAD boundary formation. We propose a structural model of TAD boundaries based on these findings that provides a basis for studying the mechanism of chromatin structure formation and gene regulation. PMID:28977568
In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.

PubMed

Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M

2014-01-30

RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.
A robust TALENs system for highly efficient mammalian genome editing.

PubMed

Feng, Yuanxi; Zhang, Siliang; Huang, Xin

2014-01-10

Recently, transcription activator-like effector nucleases (TALENs) have emerged as a highly effective tool for genomic editing. A pair of TALENs binds to two DNA recognition sites separated by a spacer sequence, and the dimerized FokI nucleases at the C terminal then cleave DNA in the spacer. Because of its modular design and capacity to precisely target almost any desired genomic locus, TALEN is a technology that can revolutionize the entire biomedical research field. Currently, for genomic editing in cultured cells, two plasmids encoding a pair of TALENs are co-transfected, followed by limited dilution to isolate cell colonies with the intended genomic manipulation. However, uncertain transfection efficiency becomes a bottleneck, especially in hard-to-transfect cells, reducing the overall efficiency of genome editing. We have developed a robust TALENs system in which each TALEN plasmid also encodes a fluorescence protein. Thus, cells transfected with both TALEN plasmids, a prerequisite for genomic editing, can be isolated by fluorescence-activated cell sorting. Our improved TALENs system can be applied to all cultured cells to achieve highly efficient genomic editing. Furthermore, an optimized procedure for genomic editing using TALENs is also presented. We expect our system to be widely adopted by the scientific community.
Oncogenomic portals for the visualization and analysis of genome-wide cancer data

PubMed Central

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-01

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice. PMID:26484415
Oncogenomic portals for the visualization and analysis of genome-wide cancer data.

PubMed

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-05

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice.
Conifer genomics and adaptation: at the crossroads of genetic diversity and genome function.

PubMed

Prunier, Julien; Verta, Jukka-Pekka; MacKay, John J

2016-01-01

Conifers have been understudied at the genomic level despite their worldwide ecological and economic importance but the situation is rapidly changing with the development of next generation sequencing (NGS) technologies. With NGS, genomics research has simultaneously gained in speed, magnitude and scope. In just a few years, genomes of 20-24 gigabases have been sequenced for several conifers, with several others expected in the near future. Biological insights have resulted from recent sequencing initiatives as well as genetic mapping, gene expression profiling and gene discovery research over nearly two decades. We review the knowledge arising from conifer genomics research emphasizing genome evolution and the genomic basis of adaptation, and outline emerging questions and knowledge gaps. We discuss future directions in three areas with potential inputs from NGS technologies: the evolutionary impacts of adaptation in conifers based on the adaptation-by-speciation model; the contributions of genetic variability of gene expression in adaptation; and the development of a broader understanding of genetic diversity and its impacts on genome function. These research directions promise to sustain research aimed at addressing the emerging challenges of adaptation that face conifer trees. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
[Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

PubMed

Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

2009-11-01

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

The Jujube Genome Provides Insights into Genome Evolution and the Domestication of Sweetness/Acidity Taste in Fruit Trees

PubMed Central

Wan, KangKang; Zhang, Zhong; Pang, Xiaoming; Yin, Xiao; Bai, Yang; Sun, Xiaoqing; Gao, Lizhi; Li, Ruiqiang; Zhang, Jinbo

2016-01-01

Jujube (Ziziphus jujuba Mill.) belongs to the Rhamnaceae family and is a popular fruit tree species with immense economic and nutritional value. Here, we report a draft genome of the dry jujube cultivar ‘Junzao’ and the genome resequencing of 31 geographically diverse accessions of cultivated and wild jujubes (Ziziphus jujuba var. spinosa). Comparative analysis revealed that the genome of ‘Dongzao’, a fresh jujube, was ~86.5 Mb larger than that of the ‘Junzao’, partially due to the recent insertions of transposable elements in the ‘Dongzao’ genome. We constructed eight proto-chromosomes of the common ancestor of Rhamnaceae and Rosaceae, two sister families in the order Rosales, and elucidated the evolutionary processes that have shaped the genome structures of modern jujubes. Population structure analysis revealed the complex genetic background of jujubes resulting from extensive hybridizations between jujube and its wild relatives. Notably, several key genes that control fruit organic acid metabolism and sugar content were identified in the selective sweep regions. We also identified S-locus genes controlling gametophytic self-incompatibility and investigated haplotype patterns of the S locus in the jujube genomes, which would provide a guideline for parent selection for jujube crossbreeding. This study provides valuable genomic resources for jujube improvement, and offers insights into jujube genome evolution and its population structure and domestication. PMID:28005948
Translating the "Banana Genome" to Delineate Stress Resistance, Dwarfing, Parthenocarpy and Mechanisms of Fruit Ripening.

PubMed

Dash, Prasanta K; Rai, Rhitu

2016-01-01

Evolutionary frozen, genetically sterile and globally iconic fruit "Banana" remained untouched by the green revolution and, as of today, researchers face intrinsic impediments for its varietal improvement. Recently, this wonder crop entered the genomics era with decoding of structural genome of double haploid Pahang (AA genome constitution) genotype of Musa acuminata . Its complex genome decoded by hybrid sequencing strategies revealed panoply of genes and transcription factors involved in the process of sucrose conversion that imparts sweetness to its fruit. Historically, banana has faced the wrath of pandemic bacterial, fungal, and viral diseases and multitude of abiotic stresses that has ruined the livelihood of small/marginal farmers' and destroyed commercial plantations. Decoding structural genome of this climacteric fruit has given impetus to a deeper understanding of the repertoire of genes involved in disease resistance, understanding the mechanism of dwarfing to develop an ideal plant type, unraveling the process of parthenocarpy, and fruit ripening for better fruit quality. Further, injunction of comparative genomics will usher in integration of information from its decoded genome and other monocots into field applications in banana related but not limited to yield enhancement, food security, livelihood assurance, and energy sustainability. In this mini review, we discuss pre- and post-genomic discoveries and highlight accomplishments in structural genomics, genetic engineering and forward genetic accomplishments with an aim to target genes and transcription factors for translational research in banana.
The Jujube Genome Provides Insights into Genome Evolution and the Domestication of Sweetness/Acidity Taste in Fruit Trees.

PubMed

Huang, Jian; Zhang, Chunmei; Zhao, Xing; Fei, Zhangjun; Wan, KangKang; Zhang, Zhong; Pang, Xiaoming; Yin, Xiao; Bai, Yang; Sun, Xiaoqing; Gao, Lizhi; Li, Ruiqiang; Zhang, Jinbo; Li, Xingang

2016-12-01

Jujube (Ziziphus jujuba Mill.) belongs to the Rhamnaceae family and is a popular fruit tree species with immense economic and nutritional value. Here, we report a draft genome of the dry jujube cultivar 'Junzao' and the genome resequencing of 31 geographically diverse accessions of cultivated and wild jujubes (Ziziphus jujuba var. spinosa). Comparative analysis revealed that the genome of 'Dongzao', a fresh jujube, was ~86.5 Mb larger than that of the 'Junzao', partially due to the recent insertions of transposable elements in the 'Dongzao' genome. We constructed eight proto-chromosomes of the common ancestor of Rhamnaceae and Rosaceae, two sister families in the order Rosales, and elucidated the evolutionary processes that have shaped the genome structures of modern jujubes. Population structure analysis revealed the complex genetic background of jujubes resulting from extensive hybridizations between jujube and its wild relatives. Notably, several key genes that control fruit organic acid metabolism and sugar content were identified in the selective sweep regions. We also identified S-locus genes controlling gametophytic self-incompatibility and investigated haplotype patterns of the S locus in the jujube genomes, which would provide a guideline for parent selection for jujube crossbreeding. This study provides valuable genomic resources for jujube improvement, and offers insights into jujube genome evolution and its population structure and domestication.
A score-statistic approach for determining threshold values in QTL mapping.

PubMed

Kao, Chen-Hung; Ho, Hsiang-An

2012-06-01

Issues in determining the threshold values of QTL mapping are often investigated for the backcross and F2 populations with relatively simple genome structures so far. The investigations of these issues in the progeny populations after F2 (advanced populations) with relatively more complicated genomes are generally inadequate. As these advanced populations have been well implemented in QTL mapping, it is important to address these issues for them in more details. Due to an increasing number of meiosis cycle, the genomes of the advanced populations can be very different from the backcross and F2 genomes. Therefore, special devices that consider the specific genome structures present in the advanced populations are required to resolve these issues. By considering the differences in genome structure between populations, we formulate more general score test statistics and gaussian processes to evaluate their threshold values. In general, we found that, given a significance level and a genome size, threshold values for QTL detection are higher in the denser marker maps and in the more advanced populations. Simulations were performed to validate our approach.
Reconstructing Past Admixture Processes from Local Genomic Ancestry Using Wavelet Transformation

PubMed Central

Sanderson, Jean; Sudoyo, Herawati; Karafet, Tatiana M.; Hammer, Michael F.; Cox, Murray P.

2015-01-01

Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions. PMID:25852078
Visualizing the global secondary structure of a viral RNA genome with cryo-electron microscopy

PubMed Central

Garmann, Rees F.; Gopal, Ajaykumar; Athavale, Shreyas S.; Knobler, Charles M.; Gelbart, William M.; Harvey, Stephen C.

2015-01-01

The lifecycle, and therefore the virulence, of single-stranded (ss)-RNA viruses is regulated not only by their particular protein gene products, but also by the secondary and tertiary structure of their genomes. The secondary structure of the entire genomic RNA of satellite tobacco mosaic virus (STMV) was recently determined by selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE). The SHAPE analysis suggested a single highly extended secondary structure with much less branching than occurs in the ensemble of structures predicted by purely thermodynamic algorithms. Here we examine the solution-equilibrated STMV genome by direct visualization with cryo-electron microscopy (cryo-EM), using an RNA of similar length transcribed from the yeast genome as a control. The cryo-EM data reveal an ensemble of branching patterns that are collectively consistent with the SHAPE-derived secondary structure model. Thus, our results both elucidate the statistical nature of the secondary structure of large ss-RNAs and give visual support for modern RNA structure determination methods. Additionally, this work introduces cryo-EM as a means to distinguish between competing secondary structure models if the models differ significantly in terms of the number and/or length of branches. Furthermore, with the latest advances in cryo-EM technology, we suggest the possibility of developing methods that incorporate restraints from cryo-EM into the next generation of algorithms for the determination of RNA secondary and tertiary structures. PMID:25752599
Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line.

PubMed

Teo, Audrey S M; Verzotto, Davide; Yao, Fei; Nagarajan, Niranjan; Hillmer, Axel M

2015-01-01

Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software. Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.
The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

PubMed

Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

2017-11-01

Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.
Nonclinical and Clinical Enterococcus faecium Strains, but Not Enterococcus faecalis Strains, Have Distinct Structural and Functional Genomic Features

PubMed Central

Kim, Eun Bae

2014-01-01

Certain strains of Enterococcus faecium and Enterococcus faecalis contribute beneficially to animal health and food production, while others are associated with nosocomial infections. To determine whether there are structural and functional genomic features that are distinct between nonclinical (NC) and clinical (CL) strains of those species, we analyzed the genomes of 31 E. faecium and 38 E. faecalis strains. Hierarchical clustering of 7,017 orthologs found in the E. faecium pangenome revealed that NC strains clustered into two clades and are distinct from CL strains. NC E. faecium genomes are significantly smaller than CL genomes, and this difference was partly explained by significantly fewer mobile genetic elements (ME), virulence factors (VF), and antibiotic resistance (AR) genes. E. faecium ortholog comparisons identified 68 and 153 genes that are enriched for NC and CL strains, respectively. Proximity analysis showed that CL-enriched loci, and not NC-enriched loci, are more frequently colocalized on the genome with ME. In CL genomes, AR genes are also colocalized with ME, and VF are more frequently associated with CL-enriched loci. Genes in 23 functional groups are also differentially enriched between NC and CL E. faecium genomes. In contrast, differences were not observed between NC and CL E. faecalis genomes despite their having larger genomes than E. faecium. Our findings show that unlike E. faecalis, NC and CL E. faecium strains are equipped with distinct structural and functional genomic features indicative of adaptation to different environments. PMID:24141120
The Tc1/mariner transposable element family shapes genetic variation and gene expression in the protist Trichomonas vaginalis

PubMed Central

2014-01-01

Background Trichomonas vaginalis is the most prevalent non-viral sexually transmitted parasite. Although the protist is presumed to reproduce asexually, 60% of its haploid genome contains transposable elements (TEs), known contributors to genome variability. The availability of a draft genome sequence and our collection of >200 global isolates of T. vaginalis facilitate the study and analysis of TE population dynamics and their contribution to genomic variability in this protist. Results We present here a pilot study of a subset of class II Tc1/mariner TEs that belong to the T. vaginalis Tvmar1 family. We report the genetic structure of 19 Tvmar1 loci, their ability to encode a full-length transposase protein, and their insertion frequencies in 94 global isolates from seven regions of the world. While most of the Tvmar1 elements studied exhibited low insertion frequencies, two of the 19 loci (locus 1 and locus 9) show high insertion frequencies of 1.00 and 0.96, respectively. The genetic structuring of the global populations identified by principal component analysis (PCA) of the Tvmar1 loci is in general agreement with published data based on genotyping, showing that Tvmar1 polymorphisms are a robust indicator of T. vaginalis genetic history. Analysis of expression of 22 genes flanking 13 Tvmar1 loci indicated significantly altered expression of six of the genes next to five Tvmar1 insertions, suggesting that the insertions have functional implications for T. vaginalis gene expression. Conclusions Our study is the first in T. vaginalis to describe Tvmar1 population dynamics and its contribution to genetic variability of the parasite. We show that a majority of our studied Tvmar1 insertion loci exist at very low frequencies in the global population, and insertions are variable between geographical isolates. In addition, we observe that low frequency insertion is related to reduced or abolished expression of flanking genes. While low insertion frequencies might be expected, we identified two Tvmar1 insertion loci that are fixed across global populations. This observation indicates that Tvmar1 insertion may have differing impacts and fitness costs in the host genome and may play varying roles in the adaptive evolution of T. vaginalis. PMID:24834134
Production of Human Albumin in Pigs Through CRISPR/Cas9-Mediated Knockin of Human cDNA into Swine Albumin Locus in the Zygotes.

PubMed

Peng, Jin; Wang, Yong; Jiang, Junyi; Zhou, Xiaoyang; Song, Lei; Wang, Lulu; Ding, Chen; Qin, Jun; Liu, Liping; Wang, Weihua; Liu, Jianqiao; Huang, Xingxu; Wei, Hong; Zhang, Pumin

2015-11-12

Precise genome modification in large domesticated animals is desirable under many circumstances. In the past it is only possible through lengthy and burdensome cloning procedures. Here we attempted to achieve that goal through the use of the newest genome-modifying tool CRISPR/Cas9. We set out to knockin human albumin cDNA into pig Alb locus for the production of recombinant human serum albumin (rHSA). HSA is a widely used human blood product and is in high demand. We show that homologous recombination can occur highly efficiently in swine zygotes. All 16 piglets born from the manipulated zygotes carry the expected knockin allele and we demonstrated the presence of human albumin in the blood of these piglets. Furthermore, the knockin allele was successfully transmitted through germline. This success in precision genomic engineering is expected to spur exploration of pigs and other large domesticated animals to be used as bioreactors for the production of biomedical products or creation of livestock strains with more desirable traits.
High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all?

PubMed

Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman

2005-06-01

Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.
Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

NASA Astrophysics Data System (ADS)

Dick, G. J.; Andersson, A.; Banfield, J. F.

2007-12-01

Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are not expected to reflect the tetranucleotide frequency signature of the host genome. Four unknown tetranucleotide frequency clusters with significant sequence (6 Mb total) were noted and analyzed further. Based on phylogenetic markers and BLAST results, these clusters represent low abundance bacteria including Acintobacteria, Firmicutes, and Proteobacteria. Functional analysis of these clusters revealved that the low- abundance bacteria harbor genes that could potentially encode important ecosystem functions such as sulfur utilization (e.g. polysulfide reductase) and polymer degradation (e.g. chitinase and glycoside hydrolase). We conclude that ESOM clustering of tetranucleotide frequency patterns is an effective method for rapidly binning shotgun community genomic sequences and a valuable tool for analyzing minor community members, which despite their low abundance may play crucial ecological roles.
Identification of structural variation in mouse genomes.

PubMed

Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

2014-01-01

Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
[Three-dimensional genome organization: a lesson from the Polycomb-Group proteins].

PubMed

Bantignies, Frédéric

2013-01-01

As more and more genomes are being explored and annotated, important features of three-dimensional (3D) genome organization are just being uncovered. In the light of what we know about Polycomb group (PcG) proteins, we will present the latest findings on this topic. The PcG proteins are well-conserved chromatin factors that repress transcription of numerous target genes. They bind the genome at specific sites, forming chromatin domains of associated histone modifications as well as higher-order chromatin structures. These 3D chromatin structures involve the interactions between PcG-bound regulatory regions at short- and long-range distances, and may significantly contribute to PcG function. Recent high throughput "Chromosome Conformation Capture" (3C) analyses have revealed many other higher order structures along the chromatin fiber, partitioning the genomes into well demarcated topological domains. This revealed an unprecedented link between linear epigenetic domains and chromosome architecture, which might be intimately connected to genome function. © Société de Biologie, 2013.
Genome Structure of the Legume, Lotus japonicus

PubMed Central

Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

2008-01-01

The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
Evolutionary genomics and population structure of Entamoeba histolytica

PubMed Central

Das, Koushik; Ganguly, Sandipan

2014-01-01

Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba. PMID:25505504
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

PubMed Central

Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

2011-01-01

Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
Permanent Draft Genome of Strain ESFC-1: Ecological Genomics of a Newly Discovered Lineage of Filamentous Diazotrophic Cyanobacteria

NASA Technical Reports Server (NTRS)

Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; Detweiler, Angela M.; Lee, Jackson Zan; Woebken, Dagmar; Bebout, Leslie E.; Pett-Ridge, Jennifer

2016-01-01

The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. One striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. Additionally, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.
Permanent draft genome of strain ESFC-1: ecological genomics of a newly discovered lineage of filamentous diazotrophic cyanobacteria

DOE PAGES

Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; ...

2016-08-24

The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. Onemore » striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. In addition, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.« less

Genomic islands of divergence are not affected by geography of speciation in sunflowers.

PubMed

Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

2013-01-01

Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.
A future scenario of the global regulatory landscape regarding genome-edited crops

PubMed Central

Araki, Motoko

2017-01-01

ABSTRACT The global agricultural landscape regarding the commercial cultivation of genetically modified (GM) crops is mosaic. Meanwhile, a new plant breeding technique, genome editing is expected to make genetic engineering-mediated crop breeding more socially acceptable because it can be used to develop crop varieties without introducing transgenes, which have hampered the regulatory review and public acceptance of GM crops. The present study revealed that product- and process-based concepts have been implemented to regulate GM crops in 30 countries. Moreover, this study analyzed the regulatory responses to genome-edited crops in the USA, Argentina, Sweden and New Zealand. The findings suggested that countries will likely be divided in their policies on genome-edited crops: Some will deregulate transgene-free crops, while others will regulate all types of crops that have been modified by genome editing. These implications are discussed from the viewpoint of public acceptance. PMID:27960622
A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6

PubMed Central

2011-01-01

Background The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected. Results Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed. Conclusions The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays. PMID:21846342
A genetically anchored physical framework for Theobroma cacao cv. Matina 1-6.

PubMed

Saski, Christopher A; Feltus, Frank A; Staton, Margaret E; Blackmon, Barbara P; Ficklin, Stephen P; Kuhn, David N; Schnell, Raymond J; Shapiro, Howard; Motamayor, Juan Carlos

2011-08-16

The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected. Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed. The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays.
Genome Editing of Structural Variations: Modeling and Gene Correction.

PubMed

Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

2016-07-01

The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development of a fluorescence-activated cell sorting method coupled with whole genome amplification to analyze minority and trace Dehalococcoides genomes in microbial communities.

PubMed

Lee, Patrick K H; Men, Yujie; Wang, Shanquan; He, Jianzhong; Alvarez-Cohen, Lisa

2015-02-03

Dehalococcoides mccartyi are functionally important bacteria that catalyze the reductive dechlorination of chlorinated ethenes. However, these anaerobic bacteria are fastidious to isolate, making downstream genomic characterization challenging. In order to facilitate genomic analysis, a fluorescence-activated cell sorting (FACS) method was developed in this study to separate D. mccartyi cells from a microbial community, and the DNA of the isolated cells was processed by whole genome amplification (WGA) and hybridized onto a D. mccartyi microarray for comparative genomics against four sequenced strains. First, FACS was successfully applied to a D. mccartyi isolate as positive control, and then microarray results verified that WGA from 10(6) cells or ∼1 ng of genomic DNA yielded high-quality coverage detecting nearly all genes across the genome. As expected, some inter- and intrasample variability in WGA was observed, but these biases were minimized by performing multiple parallel amplifications. Subsequent application of the FACS and WGA protocols to two enrichment cultures containing ∼10% and ∼1% D. mccartyi cells successfully enabled genomic analysis. As proof of concept, this study demonstrates that coupling FACS with WGA and microarrays is a promising tool to expedite genomic characterization of target strains in environmental communities where the relative concentrations are low.
Aquatic Plant Genomics: Advances, Applications, and Prospects

PubMed Central

Li, Gaojie; Yang, Jingjing

2017-01-01

Genomics is a discipline in genetics that studies the genome composition of organisms and the precise structure of genes and their expression and regulation. Genomics research has resolved many problems where other biological methods have failed. Here, we summarize advances in aquatic plant genomics with a focus on molecular markers, the genes related to photosynthesis and stress tolerance, comparative study of genomes and genome/transcriptome sequencing technology. PMID:28900619
Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

USDA-ARS?s Scientific Manuscript database

Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...
Alignment of Common Wheat and Other Grass Genomes Establishes a Comparative Genomics Research Platform

PubMed Central

Sun, Sangrong; Wang, Jinpeng; Yu, Jigao; Meng, Fanbo; Xia, Ruiyan; Wang, Li; Wang, Zhenyi; Ge, Weina; Liu, Xiaojian; Li, Yuxian; Liu, Yinzhe; Yang, Nanshan; Wang, Xiyin

2017-01-01

Grass genomes are complicated structures as they share a common tetraploidization, and particular genomes have been further affected by extra polyploidizations. These events and the following genomic re-patternings have resulted in a complex, interweaving gene homology both within a genome, and between genomes. Accurately deciphering the structure of these complicated plant genomes would help us better understand their compositional and functional evolution at multiple scales. Here, we build on our previous research by performing a hierarchical alignment of the common wheat genome vis-à-vis eight other sequenced grass genomes with most up-to-date assemblies, and annotations. With this data, we constructed a list of the homologous genes, and then, in a layer-by-layer process, separated their orthology, and paralogy that were established by speciations and recursive polyploidizations, respectively. Compared with the other grasses, the far fewer collinear outparalogous genes within each of three subgenomes of common wheat suggest that homoeologous recombination, and genomic fractionation should have occurred after its formation. In sum, this work contributes to the establishment of an important and timely comparative genomics platform for researchers in the grass community and possibly beyond. Homologous gene list can be found in Supplemental material. PMID:28912789
Genome Modification Technologies and Their Applications in Avian Species.

PubMed

Lee, Hong Jo; Kim, Young Min; Ono, Tamao; Han, Jae Yong

2017-10-26

The rapid development of genome modification technology has provided many great benefits in diverse areas of research and industry. Genome modification technologies have also been actively used in a variety of research areas and fields of industry in avian species. Transgenic technologies such as lentiviral systems and piggyBac transposition have been used to produce transgenic birds for diverse purposes. In recent years, newly developed programmable genome editing tools such as transcription activator-like effector nuclease (TALEN) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (CRISPR/Cas9) have also been successfully adopted in avian systems with primordial germ cell (PGC)-mediated genome modification. These genome modification technologies are expected to be applied to practical uses beyond system development itself. The technologies could be used to enhance economic traits in poultry such as acquiring a disease resistance or producing functional proteins in eggs. Furthermore, novel avian models of human diseases or embryonic development could also be established for research purposes. In this review, we discuss diverse genome modification technologies used in avian species, and future applications of avian biotechnology.
Genome Modification Technologies and Their Applications in Avian Species

PubMed Central

Lee, Hong Jo; Kim, Young Min; Ono, Tamao

2017-01-01

The rapid development of genome modification technology has provided many great benefits in diverse areas of research and industry. Genome modification technologies have also been actively used in a variety of research areas and fields of industry in avian species. Transgenic technologies such as lentiviral systems and piggyBac transposition have been used to produce transgenic birds for diverse purposes. In recent years, newly developed programmable genome editing tools such as transcription activator-like effector nuclease (TALEN) and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (CRISPR/Cas9) have also been successfully adopted in avian systems with primordial germ cell (PGC)-mediated genome modification. These genome modification technologies are expected to be applied to practical uses beyond system development itself. The technologies could be used to enhance economic traits in poultry such as acquiring a disease resistance or producing functional proteins in eggs. Furthermore, novel avian models of human diseases or embryonic development could also be established for research purposes. In this review, we discuss diverse genome modification technologies used in avian species, and future applications of avian biotechnology. PMID:29072628
Comparative Phylogeography in a Specific and Obligate Pollination Antagonism

PubMed Central

Espíndola, Anahí; Alvarez, Nadir

2011-01-01

In specific and obligate interactions the nature and abundance of a given species can have important effects on the survival and population dynamics of associated organisms. In a phylogeographic framework, we therefore expect that the fates of organisms interacting specifically are also tightly interrelated. Here we investigate such a scenario by analyzing the genetic structures of species interacting in an obligate plant-insect pollination lure-and-trap antagonism, involving Arum maculatum (Araceae) and its specific psychodid (Diptera) visitors Psychoda phalaenoides and Psycha grisescens. Because the interaction is asymmetric (i.e., only the plant depends on the insect), we expect the genetic structure of the plant to be related with the historical pollinator availability, yielding incongruent phylogeographic patterns between the interacting organisms. Using insect mtDNA sequences and plant AFLP genome fingerprinting, we inferred the large-scale phylogeographies of each species and the distribution of genetic diversities throughout the sampled range, and evaluated the congruence in their respective genetic structures using hierarchical analyses of molecular variances (AMOVA). Because the composition of pollinator species varies in Europe, we also examined its association with the spatial genetic structure of the plant. Our findings indicate that while the plant presents a spatially well-defined genetic structure, this is not the case in the insects. Patterns of genetic diversities also show dissimilar distributions among the three interacting species. Phylogeographic histories of the plant and its pollinating insects are thus not congruent, a result that would indicate that plant and insect lineages do not share the same glacial and postglacial histories. However, the genetic structure of the plant can, at least partially, be explained by the type of pollinators available at a regional scale. Differences in life-history traits of available pollinators might therefore have influenced the genetic structure of the plant, the dependent organism, in this antagonistic interaction. PMID:22216104
Multiple capsid-stabilizing interactions revealed in a high-resolution structure of an emerging picornavirus causing neonatal sepsis

NASA Astrophysics Data System (ADS)

Shakeel, Shabih; Westerhuis, Brenda M.; Domanska, Ausra; Koning, Roman I.; Matadeen, Rishi; Koster, Abraham J.; Bakker, Arjen Q.; Beaumont, Tim; Wolthers, Katja C.; Butcher, Sarah J.

2016-07-01

The poorly studied picornavirus, human parechovirus 3 (HPeV3) causes neonatal sepsis with no therapies available. Our 4.3-Å resolution structure of HPeV3 on its own and at 15 Å resolution in complex with human monoclonal antibody Fabs demonstrates the expected picornavirus capsid structure with three distinct features. First, 25% of the HPeV3 RNA genome in 60 sites is highly ordered as confirmed by asymmetric reconstruction, and interacts with conserved regions of the capsid proteins VP1 and VP3. Second, the VP0 N terminus stabilizes the capsid inner surface, in contrast to other picornaviruses where on expulsion as VP4, it forms an RNA translocation channel. Last, VP1's hydrophobic pocket, the binding site for the antipicornaviral drug, pleconaril, is blocked and thus inappropriate for antiviral development. Together, these results suggest a direction for development of neutralizing antibodies, antiviral drugs based on targeting the RNA-protein interactions and dissection of virus assembly on the basis of RNA nucleation.
A transthyretin-related protein is functionally expressed in Herbaspirillum seropedicae.

PubMed

Matiollo, Camila; Vernal, Javier; Ecco, Gabriela; Bertoldo, Jean Borges; Razzera, Guilherme; de Souza, Emanuel M; Pedrosa, Fábio O; Terenzi, Hernán

2009-10-02

Transthyretin-related proteins (TRPs) constitute a family of proteins structurally related to transthyretin (TTR) and are found in a large range of bacterial, fungal, plant, invertebrate, and vertebrate species. However, it was recently recognized that both prokaryotic and eukaryotic members of this family are not functionally related to transthyretins. TRPs are in fact involved in the purine catabolic pathway and function as hydroxyisourate hydrolases. An open reading frame encoding a protein similar to the Escherichia coli TRP was identified in Herbaspirillum seropedicae genome (Hs_TRP). It was cloned, overexpressed in E. coli, and purified to homogeneity. Mass spectrometry data confirmed the identity of this protein, and circular dichroism spectrum indicated a predominance of beta-sheet structure, as expected for a TRP. We have demonstrated that Hs_TRP is a 5-hydroxyisourate hydrolase and by site-directed mutagenesis the importance of three conserved catalytic residues for Hs_TRP activity was further confirmed. The production of large quantities of this recombinant protein opens up the possibility of obtaining its 3D-structure and will help further investigations into purine catabolism.
Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining

PubMed Central

Paul, Sinu; Piontkivska, Helen

2009-01-01

Background Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used. Results Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007), we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms). The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations) that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms. Conclusion We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines) and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges associated with viral escape. PMID:19580659
Functional Genomics Analysis of Big Data Identifies Novel Peroxisome Proliferator-Activated Receptor γ Target Single Nucleotide Polymorphisms Showing Association With Cardiometabolic Outcomes.

PubMed

Richardson, Kris; Schnitzler, Gavin R; Lai, Chao-Qiang; Ordovas, Jose M

2015-12-01

Cardiovascular disease and type 2 diabetes mellitus represent overlapping diseases where a large portion of the variation attributable to genetics remains unexplained. An important player in their pathogenesis is peroxisome proliferator-activated receptor γ (PPARγ) that is involved in lipid and glucose metabolism and maintenance of metabolic homeostasis. We used a functional genomics methodology to interrogate human chromatin immunoprecipitation-sequencing, genome-wide association studies, and expression quantitative trait locus data to inform selection of candidate functional single nucleotide polymorphisms (SNPs) falling in PPARγ motifs. We derived 27 328 chromatin immunoprecipitation-sequencing peaks for PPARγ in human adipocytes through meta-analysis of 3 data sets. The PPARγ consensus motif showed greatest enrichment and mapped to 8637 peaks. We identified 146 SNPs in these motifs. This number was significantly less than would be expected by chance, and Inference of Natural Selection from Interspersed Genomically coHerent elemenTs analysis indicated that these motifs are under weak negative selection. A screen of these SNPs against genome-wide association studies for cardiometabolic traits revealed significant enrichment with 16 SNPs. A screen against the MuTHER expression quantitative trait locus data revealed 8 of these were significantly associated with altered gene expression in human adipose, more than would be expected by chance. Several SNPs fall close, or are linked by expression quantitative trait locus to lipid-metabolism loci including CYP26A1. We demonstrated the use of functional genomics to identify SNPs of potential function. Specifically, that SNPs within PPARγ motifs that bind PPARγ in adipocytes are significantly associated with cardiometabolic disease and with the regulation of transcription in adipose. This method may be used to uncover functional SNPs that do not reach significance thresholds in the agnostic approach of genome-wide association studies. © 2015 American Heart Association, Inc.
Natural parameter values for generalized gene adjacency.

PubMed

Yang, Zhenyu; Sankoff, David

2010-09-01

Given the gene orders in two modern genomes, it may be difficult to decide if some genes are close enough in both genomes to infer some ancestral proximity or some functional relationship. Current methods all depend on arbitrary parameters. We explore a class of gene proximity criteria and find two kinds of natural values for their parameters. One kind has to do with the parameter value where the expected information contained in two genomes about each other is maximized. The other kind of natural value has to do with parameter values beyond which all genes are clustered. We analyze these using combinatorial and probabilistic arguments as well as simulations.
The rice genome revolution: from an ancient grain to Green Super Rice.

PubMed

Wing, Rod A; Purugganan, Michael D; Zhang, Qifa

2018-06-05

Rice is a staple crop for half the world's population, which is expected to grow by 3 billion over the next 30 years. It is also a key model for studying the genomics of agroecosystems. This dual role places rice at the centre of an enormous challenge facing agriculture: how to leverage genomics to produce enough food to feed an expanding global population. Scientists worldwide are investigating the genetic variation among domesticated rice species and their wild relatives with the aim of identifying loci that can be exploited to breed a new generation of sustainable crops known as Green Super Rice.
Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

USDA-ARS?s Scientific Manuscript database

Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Sequence-Based Genotyping for Marker Discovery and Co-Dominant Scoring in Germplasm and Populations

PubMed Central

Truong, Hoa T.; Ramos, A. Marcos; Yalcin, Feyruz; de Ruiter, Marjo; van der Poel, Hein J. A.; Huvenaars, Koen H. J.; Hogers, René C. J.; van Enckevort, Leonora. J. G.; Janssen, Antoine; van Orsouw, Nathalie J.; van Eijk, Michiel J. T.

2012-01-01

Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike. PMID:22662172

Investigation of inversion polymorphisms in the human genome using principal components analysis.

PubMed

Ma, Jianzhong; Amos, Christopher I

2012-01-01

Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct "populations" of inversion homozygotes of different orientations and their 1:1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.
Isolation, genome sequencing and functional analysis of two T7-like coliphages of avian pathogenic Escherichia coli.

PubMed

Chen, Mianmian; Xu, Juntian; Yao, Huochun; Lu, Chengping; Zhang, Wei

2016-05-10

Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. Due to the drug residues and increased antibiotic resistance caused by antibiotic use, bacteriophages and other alternative therapeutic agents are expected to control APEC infection in poultry. Two APEC phages, named P483 and P694, were isolated from the feces from the farmers market in China. We then studied their biological properties, and carried out high-throughput genome sequencing and homology analyses of these phages. Assembly results of high-throughput sequencing showed that the structures of both P483 and P694 genomes consist of linear and double-stranded DNA. Results of the electron microscopy and homology analysis revealed that both P483 and P694 belong to T7-like virus which is a member of the Podoviridae family of the Caudovirales order. Comparative genomic analysis showed that most of the predicted proteins of these two phages showed strongest sequence similarity to the Enterobacteria phages BA14 and 285P, Erwinia phage FE44, and Kluyvera phage Kvp1; however, some proteins such as gp0.6a, gp1.7 and gp17 showed lower similarity (<85%) with the homologs of other phages in the T7 subgroup. We also found some unique characteristics of P483 and P694, such as the two types of the genes of P694 and no lytic activity of P694 against its host bacteria in liquid medium. Our results serve to further our understanding of phage evolution of T7-like coliphages and provide the potential application of the phages as therapeutic agents for the treatment of diseases. Copyright © 2016 Elsevier B.V. All rights reserved.
Insights into structural variations and genome rearrangements in prokaryotic genomes.

PubMed

Periwal, Vinita; Scaria, Vinod

2015-01-01

Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genomic Quantitative Genetics to Study Evolution in the Wild.

PubMed

Gienapp, Phillip; Fior, Simone; Guillaume, Frédéric; Lasky, Jesse R; Sork, Victoria L; Csilléry, Katalin

2017-12-01

Quantitative genetic theory provides a means of estimating the evolutionary potential of natural populations. However, this approach was previously only feasible in systems where the genetic relatedness between individuals could be inferred from pedigrees or experimental crosses. The genomic revolution opened up the possibility of obtaining the realized proportion of genome shared among individuals in natural populations of virtually any species, which could promise (more) accurate estimates of quantitative genetic parameters in virtually any species. Such a 'genomic' quantitative genetics approach relies on fewer assumptions, offers a greater methodological flexibility, and is thus expected to greatly enhance our understanding of evolution in natural populations, for example, in the context of adaptation to environmental change, eco-evolutionary dynamics, and biodiversity conservation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Exploiting genotyping by sequencing to characterize the genomic structure of the American cranberry through high-density linkage mapping.

PubMed

Covarrubias-Pazaran, Giovanny; Diaz-Garcia, Luis; Schlautman, Brandon; Deutsch, Joseph; Salazar, Walter; Hernandez-Ochoa, Miguel; Grygleski, Edward; Steffan, Shawn; Iorizzo, Massimo; Polashock, James; Vorsa, Nicholi; Zalapa, Juan

2016-06-13

The application of genotyping by sequencing (GBS) approaches, combined with data imputation methodologies, is narrowing the genetic knowledge gap between major and understudied, minor crops. GBS is an excellent tool to characterize the genomic structure of recently domesticated (~200 years) and understudied species, such as cranberry (Vaccinium macrocarpon Ait.), by generating large numbers of markers for genomic studies such as genetic mapping. We identified 10842 potentially mappable single nucleotide polymorphisms (SNPs) in a cranberry pseudo-testcross population wherein 5477 SNPs and 211 short sequence repeats (SSRs) were used to construct a high density linkage map in cranberry of which a total of 4849 markers were mapped. Recombination frequency, linkage disequilibrium (LD), and segregation distortion at the genomic level in the parental and integrated linkage maps were characterized for first time in cranberry. SSR markers, used as the backbone in the map, revealed high collinearity with previously published linkage maps. The 4849 point map consisted of twelve linkage groups spanning 1112 cM, which anchored 2381 nuclear scaffolds accounting for ~13 Mb of the estimated 470 Mb cranberry genome. Bin mapping identified 592 and 672 unique bins in the parentals and a total of 1676 unique marker positions in the integrated map. Synteny analyses comparing the order of anchored cranberry scaffolds to their homologous positions in kiwifruit, grape, and coffee genomes provided initial evidence of homology between cranberry and closely related species. GBS data was used to rapidly saturate the cranberry genome with markers in a pseudo-testcross population. Collinearity between the present saturated genetic map and previous cranberry SSR maps suggests that the SNP locations represent accurate marker order and chromosome structure of the cranberry genome. SNPs greatly improved current marker genome coverage, which allowed for genome-wide structure investigations such as segregation distortion, recombination, linkage disequilibrium, and synteny analyses. In the future, GBS can be used to accelerate cranberry molecular breeding through QTL mapping and genome-wide association studies (GWAS).
Deep ancestry of programmed genome rearrangement in lampreys.

PubMed

Timoshevskiy, Vladimir A; Lampman, Ralph T; Hess, Jon E; Porter, Laurie L; Smith, Jeramiah J

2017-09-01

In most multicellular organisms, the structure and content of the genome is rigorously maintained over the course of development. However some species have evolved genome biologies that permit, or require, developmentally regulated changes in the physical structure and content of the genome (programmed genome rearrangement: PGR). Relatively few vertebrates are known to undergo PGR, although all agnathans surveyed to date (several hagfish and one lamprey: Petromyzon marinus) show evidence of large scale PGR. To further resolve the ancestry of PGR within vertebrates, we developed probes that allow simultaneous tracking of nearly all sequences eliminated by PGR in P. marinus and a second lamprey species (Entosphenus tridentatus). These comparative analyses reveal conserved subcellular structures (lagging chromatin and micronuclei) associated with PGR and provide the first comparative embryological evidence in support of the idea that PGR represents an ancient and evolutionarily stable strategy for regulating inherent developmental/genetic conflicts between germline and soma. Copyright © 2017 Elsevier Inc. All rights reserved.
Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak

PubMed Central

Wu, Xiaoyun; Wang, Kun; Ding, Xuezhi; Wang, Mingcheng; Chu, Min; Xie, Xiuyue; Qiu, Qiang; Yan, Ping

2016-01-01

The absence of horns, known as the polled phenotype, is an economically important trait in modern yak husbandry, but the genomic structure and genetic basis of this phenotype have yet to be discovered. Here, we conducted a genome-wide association study with a panel of 10 horned and 10 polled yaks using whole genome sequencing. We mapped the POLLED locus to a 200-kb interval, which comprises three protein-coding genes. Further characterization of the candidate region showed recent artificial selection signals resulting from the breeding process. We suggest that expressional variations rather than structural variations in protein probably contribute to the polled phenotype. Our results not only represent the first and important step in establishing the genomic structure of the polled region in yak, but also add to our understanding of the polled trait in bovid species. PMID:27389700
Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure

PubMed Central

De Chiara, Matteo; Hood, Derek; Muzzi, Alessandro; Pickard, Derek J.; Perkins, Tim; Pizza, Mariagrazia; Dougan, Gordon; Rappuoli, Rino; Moxon, E. Richard; Soriani, Marco; Donati, Claudio

2014-01-01

One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen. PMID:24706866
Genome sequencing of disease and carriage isolates of nontypeable Haemophilus influenzae identifies discrete population structure.

PubMed

De Chiara, Matteo; Hood, Derek; Muzzi, Alessandro; Pickard, Derek J; Perkins, Tim; Pizza, Mariagrazia; Dougan, Gordon; Rappuoli, Rino; Moxon, E Richard; Soriani, Marco; Donati, Claudio

2014-04-08

One of the main hurdles for the development of an effective and broadly protective vaccine against nonencapsulated isolates of Haemophilus influenzae (NTHi) lies in the genetic diversity of the species, which renders extremely difficult the identification of cross-protective candidate antigens. To assess whether a population structure of NTHi could be defined, we performed genome sequencing of a collection of diverse clinical isolates representative of both carriage and disease and of the diversity of the natural population. Analysis of the distribution of polymorphic sites in the core genome and of the composition of the accessory genome defined distinct evolutionary clades and supported a predominantly clonal evolution of NTHi, with the majority of genetic information transmitted vertically within lineages. A correlation between the population structure and the presence of selected surface-associated proteins and lipooligosaccharide structure, known to contribute to virulence, was found. This high-resolution, genome-based population structure of NTHi provides the foundation to obtain a better understanding, of NTHi adaptation to the host as well as its commensal and virulence behavior, that could facilitate intervention strategies against disease caused by this important human pathogen.
Genome alignment with graph data structures: a comparison

PubMed Central

2014-01-01

Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884
The Geography of Recent Genetic Ancestry across Europe

PubMed Central

Ralph, Peter; Coop, Graham

2013-01-01

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (in the Population Reference Sample [POPRES] dataset) to conduct one of the first surveys of recent genealogical ancestry over the past 3,000 years at a continental scale. We detected 1.9 million shared long genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 2–12 genetic common ancestors from the last 1,500 years, and upwards of 100 genetic ancestors from the previous 1,000 years. These numbers drop off exponentially with geographic distance, but since these genetic ancestors are a tiny fraction of common genealogical ancestors, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1,000 years. There is also substantial regional variation in the number of shared genetic ancestors. For example, there are especially high numbers of common ancestors shared between many eastern populations that date roughly to the migration period (which includes the Slavic and Hunnic expansions into that region). Some of the lowest levels of common ancestry are seen in the Italian and Iberian peninsulas, which may indicate different effects of historical population expansions in these areas and/or more stably structured populations. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world. PMID:23667324
Contrasting Patterns of rDNA Homogenization within the Zygosaccharomyces rouxii Species Complex

PubMed Central

Chand Dakal, Tikam; Giudici, Paolo; Solieri, Lisa

2016-01-01

Arrays of repetitive ribosomal DNA (rDNA) sequences are generally expected to evolve as a coherent family, where repeats within such a family are more similar to each other than to orthologs in related species. The continuous homogenization of repeats within individual genomes is a recombination process termed concerted evolution. Here, we investigated the extent and the direction of concerted evolution in 43 yeast strains of the Zygosaccharomyces rouxii species complex (Z. rouxii, Z. sapae, Z. mellis), by analyzing two portions of the 35S rDNA cistron, namely the D1/D2 domains at the 5’ end of the 26S rRNA gene and the segment including the internal transcribed spacers (ITS) 1 and 2 (ITS regions). We demonstrate that intra-genomic rDNA sequence variation is unusually frequent in this clade and that rDNA arrays in single genomes consist of an intermixing of Z. rouxii, Z. sapae and Z. mellis-like sequences, putatively evolved by reticulate evolutionary events that involved repeated hybridization between lineages. The levels and distribution of sequence polymorphisms vary across rDNA repeats in different individuals, reflecting four patterns of rDNA evolution: I) rDNA repeats that are homogeneous within a genome but are chimeras derived from two parental lineages via recombination: Z. rouxii in the ITS region and Z. sapae in the D1/D2 region; II) intra-genomic rDNA repeats that retain polymorphisms only in ITS regions; III) rDNA repeats that vary only in their D1/D2 domains; IV) heterogeneous rDNA arrays that have both polymorphic ITS and D1/D2 regions. We argue that an ongoing process of homogenization following allodiplodization or incomplete lineage sorting gave rise to divergent evolutionary trajectories in different strains, depending upon temporal, structural and functional constraints. We discuss the consequences of these findings for Zygosaccharomyces species delineation and, more in general, for yeast barcoding. PMID:27501051
GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

PubMed

Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi

2017-05-24

As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is < 50%, CpG obs /CpG exp varies, and the length of CGI ranges from eight nucleotides to a few thousand of nucleotides. It implies that CGI detection is not just a straightly statistical task and some unrevealed rules probably are hidden. A novel Gaussian model, GaussianCpG, is developed for detection of CpG islands on human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.
The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations.

PubMed

Cerqueira, Gustavo C; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R

2014-01-01

The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes.

PubMed

Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M

2016-10-19

The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
MPD: a pathogen genome and metagenome database

PubMed Central

Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen

2018-01-01

Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
Structure and variation of the mitochondrial genome of fishes.

PubMed

Satoh, Takashi P; Miya, Masaki; Mabuchi, Kohji; Nishida, Mutsumi

2016-09-07

The mitochondrial (mt) genome has been used as an effective tool for phylogenetic and population genetic analyses in vertebrates. However, the structure and variability of the vertebrate mt genome are not well understood. A potential strategy for improving our understanding is to conduct a comprehensive comparative study of large mt genome data. The aim of this study was to characterize the structure and variability of the fish mt genome through comparative analysis of large datasets. An analysis of the secondary structure of proteins for 250 fish species (248 ray-finned and 2 cartilaginous fishes) illustrated that cytochrome c oxidase subunits (COI, COII, and COIII) and a cytochrome bc1 complex subunit (Cyt b) had substantial amino acid conservation. Among the four proteins, COI was the most conserved, as more than half of all amino acid sites were invariable among the 250 species. Our models identified 43 and 58 stems within 12S rRNA and 16S rRNA, respectively, with larger numbers than proposed previously for vertebrates. The models also identified 149 and 319 invariable sites in 12S rRNA and 16S rRNA, respectively, in all fishes. In particular, the present result verified that a region corresponding to the peptidyl transferase center in prokaryotic 23S rRNA, which is homologous to mt 16S rRNA, is also conserved in fish mt 16S rRNA. Concerning the gene order, we found 35 variations (in 32 families) that deviated from the common gene order in vertebrates. These gene rearrangements were mostly observed in the area spanning the ND5 gene to the control region as well as two tRNA gene cluster regions (IQM and WANCY regions). Although many of such gene rearrangements were unique to a specific taxon, some were shared polyphyletically between distantly related species. Through a large-scale comparative analysis of 250 fish species mt genomes, we elucidated various structural aspects of the fish mt genome and the encoded genes. The present results will be important for understanding functions of the mt genome and developing programs for nucleotide sequence analysis. This study demonstrated the significance of extensive comparisons for understanding the structure of the mt genome.
RNA 3D Modules in Genome-Wide Predictions of RNA 2D Structure

PubMed Central

Theis, Corinna; Zirbel, Craig L.; zu Siederdissen, Christian Höner; Anthon, Christian; Hofacker, Ivo L.; Nielsen, Henrik; Gorodkin, Jan

2015-01-01

Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution. These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D module prediction tools and apply them on a 13-way vertebrate sequence-based alignment. We find that RNA 3D modules predicted by metaRNAmodules and JAR3D are significantly enriched in the screened windows compared to their shuffled counterparts. The initially estimated FDR of 47.0% is lowered to below 25% when certain 3D module predictions are present in the window of the 2D prediction. We discuss the implications and prospects for further development of computational strategies for detection of RNA 2D structure in genomic sequence. PMID:26509713
Evidence of pervasive biologically functional secondary structures within the genomes of eukaryotic single-stranded DNA viruses.

PubMed

Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick

2014-02-01

Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.
Evidence of Pervasive Biologically Functional Secondary Structures within the Genomes of Eukaryotic Single-Stranded DNA Viruses

PubMed Central

Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y. F.; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie

2014-01-01

Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here. PMID:24284329

Uronic polysaccharide degrading enzymes.

PubMed

Garron, Marie-Line; Cygler, Miroslaw

2014-10-01

In the past several years progress has been made in the field of structure and function of polysaccharide lyases (PLs). The number of classified polysaccharide lyase families has increased to 23 and more detailed analysis has allowed the identification of more closely related subfamilies, leading to stronger correlation between each subfamily and a unique substrate. The number of as yet unclassified polysaccharide lyases has also increased and we expect that sequencing projects will allow many of these unclassified sequences to emerge as new families. The progress in structural analysis of PLs has led to having at least one representative structure for each of the families and for two unclassified enzymes. The newly determined structures have folds observed previously in other PL families and their catalytic mechanisms follow either metal-assisted or Tyr/His mechanisms characteristic for other PL enzymes. Comparison of PLs with glycoside hydrolases (GHs) shows several folds common to both classes but only for the β-helix fold is there strong indication of divergent evolution from a common ancestor. Analysis of bacterial genomes identified gene clusters containing multiple polysaccharide cleaving enzymes, the Polysaccharides Utilization Loci (PULs), and their gene complement suggests that they are organized to process completely a specific polysaccharide. Copyright © 2014 Elsevier Ltd. All rights reserved.
Variation block-based genomics method for crop plants.

PubMed

Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

2014-06-15

In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Influence of genome and bio-ecology on the prevalence of genome exchange in unisexuals of the Ambystoma complex.

PubMed

Beauregard, France; Angers, Bernard

2018-05-31

Unisexuals of the blue-spotted salamander complex are thought to reproduce by kleptogenesis. Genome exchanges associated with this sperm-dependent mode of reproduction are expected to result in a higher genetic variation and multiple ploidy levels compared to clonality. However, the existence of some populations exclusively formed of genetically identical individuals suggests that factors could prevent genome exchanges. This study aimed at assessing the prevalence of genome exchange among unisexuals of the Ambystoma laterale-jeffersonianum complex from 10 sites in the northern part of their distribution. A total of 235 individuals, including 207 unisexuals, were genotyped using microsatellite loci and AFLP. Unisexual individuals could be sorted in five genetically distinct groups, likely derived from the same paternal A. jeffersonianum haplome. One of these groups exclusively reproduced clonally, even when found in sympatry with lineages presenting signature of genome exchange. Genome exchange was site-dependent for another group. Genome exchange was detected at all sites for the three remaining groups. Prevalence of genome exchange appears to be associated with ecological conditions such as availability of effective sperm donors. Intrinsic genomic factors may also affect this process, since different lineages in sympatry present highly variable rate of genome exchange. The coexistence of clonal and genetically diversified lineages opens the door to further research on alternatives to genetic variation.
Contrasting support for alternative models of genomic variation based on microhabitat preference: species-specific effects of climate change in alpine sedges.

PubMed

Massatti, Rob; Knowles, L Lacey

2016-08-01

Deterministic processes may uniquely affect codistributed species' phylogeographic patterns such that discordant genetic variation among taxa is predicted. Yet, explicitly testing expectations of genomic discordance in a statistical framework remains challenging. Here, we construct spatially and temporally dynamic models to investigate the hypothesized effect of microhabitat preferences on the permeability of glaciated regions to gene flow in two closely related montane species. Utilizing environmental niche models from the Last Glacial Maximum and the present to inform demographic models of changes in habitat suitability over time, we evaluate the relative probabilities of two alternative models using approximate Bayesian computation (ABC) in which glaciated regions are either (i) permeable or (ii) a barrier to gene flow. Results based on the fit of the empirical data to data sets simulated using a spatially explicit coalescent under alternative models indicate that genomic data are consistent with predictions about the hypothesized role of microhabitat in generating discordant patterns of genetic variation among the taxa. Specifically, a model in which glaciated areas acted as a barrier was much more probable based on patterns of genomic variation in Carex nova, a wet-adapted species. However, in the dry-adapted Carex chalciolepis, the permeable model was more probable, although the difference in the support of the models was small. This work highlights how statistical inferences can be used to distinguish deterministic processes that are expected to result in discordant genomic patterns among species, including species-specific responses to climate change. © 2016 John Wiley & Sons Ltd.
Translational research in genomics of Alzheimer's disease: a review of current practice and future perspectives.

PubMed

Mihaescu, Raluca; Detmar, Symone B; Cornel, Martina C; van der Flier, Wiesje M; Heutink, Peter; Hol, Elly M; Rikkert, Marcel G M Olde; van Duijn, Cornelia M; Janssens, A Cecile J W

2010-01-01

Alzheimer's disease (AD) is the most prevalent form of dementia and the number of cases is expected to increase exponentially worldwide. Three highly penetrant genes (AbetaPP, PSEN1, and PSEN2) explain only a small number of AD cases with a Mendelian transmission pattern. Many genes have been analyzed for association with non-Mendelian AD, but the only consistently replicated finding is APOE. At present, possibilities for prevention, early detection, and treatment of the disease are limited. Predictive and diagnostic genetic testing is available only in Mendelian forms of AD. Currently, APOE genotyping is not considered clinically useful for screening, presymptomatic testing, or clinical diagnosis of non-Mendelian AD. However, clinical management of the disease is expected to benefit from the rapid pace of discoveries in the genomics of AD. Following a recently developed framework for the continuum of translation research that is needed to move genetic discoveries to health applications, this paper reviews recent genetic discoveries as well as translational research on genomic applications in the prevention, early detection, and treatment of AD. The four phases of translation research include: 1) translation of basic genomics research into a potential health care application; 2) evaluation of the application for the development of evidence-based guidelines; 3) evaluation of the implementation and use of the application in health care practice; and 4) evaluation of the achieved population health impact. Most research on genome-based applications in AD is still in the first phase of the translational research framework, which means that further research is still needed before their implementation can be considered.
Genome-scale dynamic modeling of the competition between Rhodoferax and Geobacter in anoxic subsurface environments.

PubMed

Zhuang, Kai; Izallalen, Mounir; Mouser, Paula; Richter, Hanno; Risso, Carla; Mahadevan, Radhakrishnan; Lovley, Derek R

2011-02-01

The advent of rapid complete genome sequencing, and the potential to capture this information in genome-scale metabolic models, provide the possibility of comprehensively modeling microbial community interactions. For example, Rhodoferax and Geobacter species are acetate-oxidizing Fe(III)-reducers that compete in anoxic subsurface environments and this competition may have an influence on the in situ bioremediation of uranium-contaminated groundwater. Therefore, genome-scale models of Geobacter sulfurreducens and Rhodoferax ferrireducens were used to evaluate how Geobacter and Rhodoferax species might compete under diverse conditions found in a uranium-contaminated aquifer in Rifle, CO. The model predicted that at the low rates of acetate flux expected under natural conditions at the site, Rhodoferax will outcompete Geobacter as long as sufficient ammonium is available. The model also predicted that when high concentrations of acetate are added during in situ bioremediation, Geobacter species would predominate, consistent with field-scale observations. This can be attributed to the higher expected growth yields of Rhodoferax and the ability of Geobacter to fix nitrogen. The modeling predicted relative proportions of Geobacter and Rhodoferax in geochemically distinct zones of the Rifle site that were comparable to those that were previously documented with molecular techniques. The model also predicted that under nitrogen fixation, higher carbon and electron fluxes would be diverted toward respiration rather than biomass formation in Geobacter, providing a potential explanation for enhanced in situ U(VI) reduction in low-ammonium zones. These results show that genome-scale modeling can be a useful tool for predicting microbial interactions in subsurface environments and shows promise for designing bioremediation strategies.
Characterizing polymorphic inversions in human genomes by single-cell sequencing

PubMed Central

Sanders, Ashley D.; Hills, Mark; Porubský, David; Guryev, Victor; Falconer, Ester; Lansdorp, Peter M.

2016-01-01

Identifying genomic features that differ between individuals and cells can help uncover the functional variants that drive phenotypes and disease susceptibilities. For this, single-cell studies are paramount, as it becomes increasingly clear that the contribution of rare but functional cellular subpopulations is important for disease prognosis, management, and progression. Until now, studying these associations has been challenged by our inability to map structural rearrangements accurately and comprehensively. To overcome this, we coupled single-cell sequencing of DNA template strands (Strand-seq) with custom analysis software to rapidly discover, map, and genotype genomic rearrangements at high resolution. This allowed us to explore the distribution and frequency of inversions in a heterogeneous cell population, identify several polymorphic domains in complex regions of the genome, and locate rare alleles in the reference assembly. We then mapped the entire genomic complement of inversions within two unrelated individuals to characterize their distinct inversion profiles and built a nonredundant global reference of structural rearrangements in the human genome. The work described here provides a powerful new framework to study structural variation and genomic heterogeneity in single-cell samples, whether from individuals for population studies or tissue types for biomarker discovery. PMID:27472961
Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo

PubMed Central

Ritchey, Laura E.; Su, Zhao; Tang, Yin; Tack, David C.

2017-01-01

Abstract RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. PMID:28637286
Comparative Reannotation of 21 Aspergillus Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Salamov, Asaf; Riley, Robert; Kuo, Alan

2013-03-08

We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one whichmore » most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.« less
Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

PubMed

Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji

2015-04-01

Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.
Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

PubMed

Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

2015-01-01

The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.
Genetic diversity and population structure of Musa accessions in ex situ conservation

PubMed Central

2013-01-01

Background Banana cultivars are mostly derived from hybridization between wild diploid subspecies of Musa acuminata (A genome) and M. balbisiana (B genome), and they exhibit various levels of ploidy and genomic constitution. The Embrapa ex situ Musa collection contains over 220 accessions, of which only a few have been genetically characterized. Knowledge regarding the genetic relationships and diversity between modern cultivars and wild relatives would assist in conservation and breeding strategies. Our objectives were to determine the genomic constitution based on Internal Transcribed Spacer (ITS) regions polymorphism and the ploidy of all accessions by flow cytometry and to investigate the population structure of the collection using Simple Sequence Repeat (SSR) loci as co-dominant markers based on Structure software, not previously performed in Musa. Results From the 221 accessions analyzed by flow cytometry, the correct ploidy was confirmed or established for 212 (95.9%), whereas digestion of the ITS region confirmed the genomic constitution of 209 (94.6%). Neighbor-joining clustering analysis derived from SSR binary data allowed the detection of two major groups, essentially distinguished by the presence or absence of the B genome, while subgroups were formed according to the genomic composition and commercial classification. The co-dominant nature of SSR was explored to analyze the structure of the population based on a Bayesian approach, detecting 21 subpopulations. Most of the subpopulations were in agreement with the clustering analysis. Conclusions The data generated by flow cytometry, ITS and SSR supported the hypothesis about the occurrence of homeologue recombination between A and B genomes, leading to discrepancies in the number of sets or portions from each parental genome. These phenomenons have been largely disregarded in the evolution of banana, as the “single-step domestication” hypothesis had long predominated. These findings will have an impact in future breeding approaches. Structure analysis enabled the efficient detection of ancestry of recently developed tetraploid hybrids by breeding programs, and for some triploids. However, for the main commercial subgroups, Structure appeared to be less efficient to detect the ancestry in diploid groups, possibly due to sampling restrictions. The possibility of inferring the membership among accessions to correct the effects of genetic structure opens possibilities for its use in marker-assisted selection by association mapping. PMID:23497122
Rapid Detection of Positive Selection in Genes and Genomes Through Variation Clusters

PubMed Central

Wagner, Andreas

2007-01-01

Positive selection in genes and genomes can point to the evolutionary basis for differences among species and among races within a species. The detection of positive selection can also help identify functionally important protein regions and thus guide protein engineering. Many existing tests for positive selection are excessively conservative, vulnerable to artifacts caused by demographic population history, or computationally very intensive. I here propose a simple and rapid test that is complementary to existing tests and that overcomes some of these problems. It relies on the null hypothesis that neutrally evolving DNA regions should show a Poisson distribution of nucleotide substitutions. The test detects significant deviations from this expectation in the form of variation clusters, highly localized groups of amino acid changes in a coding region. In applying this test to several thousand human–chimpanzee gene orthologs, I show that such variation clusters are not generally caused by relaxed selection. They occur in well-defined domains of a protein's tertiary structure and show a large excess of amino acid replacement over silent substitutions. I also identify multiple new human–chimpanzee orthologs subject to positive selection, among them genes that are involved in reproductive functions, immune defense, and the nervous system. PMID:17603100
Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies

PubMed Central

Schatz, Michael C.; Phillippy, Adam M.; Sommer, Daniel D.; Delcher, Arthur L.; Puiu, Daniela; Narzisi, Giuseppe; Salzberg, Steven L.; Pop, Mihai

2013-01-01

Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net. PMID:22199379
Construction of two whole genome radiation hybrid panels for dromedary (Camelus dromedarius): 5000RAD and 15000RAD.

PubMed

Perelman, Polina L; Pichler, Rudolf; Gaggl, Anna; Larkin, Denis M; Raudsepp, Terje; Alshanbari, Fahad; Holl, Heather M; Brooks, Samantha A; Burger, Pamela A; Periasamy, Kathiravan

2018-01-31

The availability of genomic resources including linkage information for camelids has been very limited. Here, we describe the construction of a set of two radiation hybrid (RH) panels (5000 RAD and 15000 RAD ) for the dromedary (Camelus dromedarius) as a permanent genetic resource for camel genome researchers worldwide. For the 5000 RAD panel, a total of 245 female camel-hamster radiation hybrid clones were collected, of which 186 were screened with 44 custom designed marker loci distributed throughout camel genome. The overall mean retention frequency (RF) of the final set of 93 hybrids was 47.7%. For the 15000 RAD panel, 238 male dromedary-hamster radiation hybrid clones were collected, of which 93 were tested using 44 PCR markers. The final set of 90 clones had a mean RF of 39.9%. This 15000 RAD panel is an important high-resolution complement to the main 5000 RAD panel and an indispensable tool for resolving complex genomic regions. This valuable genetic resource of dromedary RH panels is expected to be instrumental for constructing a high resolution camel genome map. Construction of the set of RH panels is essential step toward chromosome level reference quality genome assembly that is critical for advancing camelid genomics and the development of custom genomic tools.
Genetic structure of Mount Huang honey bee (Apis cerana) populations: evidence from microsatellite polymorphism.

PubMed

Liu, Fang; Shi, Tengfei; Huang, Sisi; Yu, Linsheng; Bi, Shoudong

2016-01-01

The Mount Huang eastern honey bees ( Apis cerana ) are an endemic population, which is well adapted to the local agricultural and ecological environment. In this study, the genetic structure of seven eastern honey bees ( A. cerana ) populations from Mount Huang in China were analyzed by SSR (simple sequence repeat) markers. The results revealed that 16 pairs of primers used amplified a total of 143 alleles. The number of alleles per locus ranged from 6 to 13, with a mean value of 8.94 alleles per locus. Observed and expected heterozygosities showed mean values of 0.446 and 0.831 respectively. UPGMA cluster analysis grouped seven eastern honey bees in three groups. The results obtained show a high genetic diversity in the honey bee populations studied in Mount Huang, and high differentiation among all the populations, suggesting that scarce exchange of honey bee species happened in Mount Huang. Our study demonstrated that the Mount Huang honey bee populations still have a natural genome worth being protected for conservation.
Parallel Evolution and Horizontal Gene Transfer of the pst Operon in Firmicutes from Oligotrophic Environments

PubMed Central

Moreno-Letelier, Alejandra; Olmedo, Gabriela; Eguiarte, Luis E.; Martinez-Castilla, Leon; Souza, Valeria

2011-01-01

The high affinity phosphate transport system (pst) is crucial for phosphate uptake in oligotrophic environments. Cuatro Cienegas Basin (CCB) has extremely low P levels and its endemic Bacillus are closely related to oligotrophic marine Firmicutes. Thus, we expected the pst operon of CCB to share the same evolutionary history and protein similarity to marine Firmicutes. Orthologs of the pst operon were searched in 55 genomes of Firmicutes and 13 outgroups. Phylogenetic reconstructions were performed for the pst operon and 14 concatenated housekeeping genes using maximum likelihood methods. Conserved domains and 3D structures of the phosphate-binding protein (PstS) were also analyzed. The pst operon of Firmicutes shows two highly divergent clades with no correlation to the type of habitat nor a phylogenetic congruence, suggesting horizontal gene transfer. Despite sequence divergence, the PstS protein had a similar 3D structure, which could be due to parallel evolution after horizontal gene transfer events. PMID:21461370
The public goods hypothesis for the evolution of life on Earth

PubMed Central

2011-01-01

It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis. PMID:21861918
The Public Goods Hypothesis for the evolution of life on Earth.

PubMed

McInerney, James O; Pisani, Davide; Bapteste, Eric; O'Connell, Mary J

2011-08-23

It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation

PubMed Central

Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.

2013-01-01

The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392

Rapid identification of sequences for orphan enzymes to power accurate protein annotation.

PubMed

Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G

2013-01-01

The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Structural Genomics of Protein Phosphatases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Almo,S.; Bonanno, J.; Sauder, J.

The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptionalmore » regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.« less
Genetic structure of the Newfoundland and Labrador population: founder effects modulate variability.

PubMed

Zhai, Guangju; Zhou, Jiayi; Woods, Michael O; Green, Jane S; Parfrey, Patrick; Rahman, Proton; Green, Roger C

2016-07-01

The population of the province of Newfoundland and Labrador (NL) has been a resource for genetic studies because of its historical isolation and increased prevalence of several monogenic disorders. Controversy remains regarding the genetic substructure and the extent of genetic homogeneity, which have implications for disease gene mapping. Population substructure has been reported from other isolated populations such as Iceland, Finland and Sardinia. We undertook this study to further our understanding of the genetic architecture of the NL population. We enrolled 494 individuals randomly selected from NL. Genome-wide SNP data were analyzed together with that from 14 other populations including HapMap3, Ireland, Britain and Native American samples from the Human Genome Diversity Project. Using multidimensional scaling and admixture analysis, we observed that the genetic structure of the NL population resembles that of the British population but can be divided into three clusters that correspond to religious/ethnic origins: Protestant English, Roman Catholic Irish and North American aboriginals. We observed reduced heterozygosity and an increased inbreeding coefficient (mean=0.005), which corresponds to that expected in the offspring of third-cousin marriages. We also found that the NL population has a significantly higher number of runs of homozygosity (ROH) and longer lengths of ROH segments. These results are consistent with our understanding of the population history and indicate that the NL population may be ideal for identifying recessive variants for complex diseases that affect populations of European origin.
Molecular characterization of a gene POLR2H encoded an essential subunit for RNA polymerase II from the Giant Panda (Ailuropoda Melanoleuca).

PubMed

Du, Yu-Jie; Hou, Yi-Ling; Hou, Wan-Ru

2013-02-01

The Giant Panda is an endangered and valuable gene pool in genetic, its important functional gene POLR2H encodes an essential shared peptide H of RNA polymerases. The genomic DNA and cDNA sequences were cloned successfully for the first time from the Giant Panda (Ailuropoda melanoleuca) adopting touchdown-PCR and reverse transcription polymerase chain reaction (RT-PCR), respectively. The length of the genomic sequence of the Giant Panda is 3,285 bp, including five exons and four introns. The cDNA fragment cloned is 509 bp in length, containing an open reading frame of 453 bp encoding 150 amino acids. Alignment analysis indicated that both the cDNA and its deduced amino acid sequence were highly conserved. Protein structure prediction showed that there was one protein kinase C phosphorylation site, four casein kinase II phosphorylation sites and one amidation site in the POLR2H protein, further shaping advanced protein structure. The cDNA cloned was expressed in Escherichia coli, which indicated that POLR2H fusion with the N-terminally His-tagged form brought about the accumulation of an expected 20.5 kDa polypeptide in line with the predicted protein. On the basis of what has already been achieved in this study, further deep-in research will be conducted, which has great value in theory and practical significance.
Statistical methods for detecting periodic fragments in DNA sequence data

PubMed Central

2011-01-01

Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the presence of eroded periodicity. The autocorrelation method was identified as poorly suited for use with the blockwise bootstrap. Application of our methods to the genomes of two model organisms revealed a striking proportion of the yeast and mouse genomes are spanned by NPS. Despite their markedly different sizes, roughly equivalent proportions (19-21%) of the genomes lie within period-10 spans of the NPS dinucleotides {AA, TT, TA}. The biological significance of these regions remains to be demonstrated. To facilitate this, the genomic coordinates are available as Additional files 1, 2, and 3 in a format suitable for visualisation as tracks on popular genome browsers. Reviewers This article was reviewed by Prof Tomas Radivoyevitch, Dr Vsevolod Makeev (nominated by Dr Mikhail Gelfand), and Dr Rob D Knight. PMID:21527008
A domain-centric solution to functional genomics via dcGO Predictor

PubMed Central

2013-01-01

Background Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics. Results Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool. Conclusions As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era. PMID:23514627
Cryo-electron Microscopy Study of the Genome Release of the Dicistrovirus Israeli Acute Bee Paralysis Virus.

PubMed

Mullapudi, Edukondalu; Füzik, Tibor; Přidal, Antonín; Plevka, Pavel

2017-02-15

Viruses of the family Dicistroviridae can cause substantial economic damage by infecting agriculturally important insects. Israeli acute bee paralysis virus (IAPV) causes honeybee colony collapse disorder in the United States. High-resolution molecular details of the genome delivery mechanism of dicistroviruses are unknown. Here we present a cryo-electron microscopy analysis of IAPV virions induced to release their genomes in vitro We determined structures of full IAPV virions primed to release their genomes to a resolution of 3.3 Å and of empty capsids to a resolution of 3.9 Å. We show that IAPV does not form expanded A particles before genome release as in the case of related enteroviruses of the family Picornaviridae The structural changes observed in the empty IAPV particles include detachment of the VP4 minor capsid proteins from the inner face of the capsid and partial loss of the structure of the N-terminal arms of the VP2 capsid proteins. Unlike the case for many picornaviruses, the empty particles of IAPV are not expanded relative to the native virions and do not contain pores in their capsids that might serve as channels for genome release. Therefore, rearrangement of a unique region of the capsid is probably required for IAPV genome release. Honeybee populations in Europe and North America are declining due to pressure from pathogens, including viruses. Israeli acute bee paralysis virus (IAPV), a member of the family Dicistroviridae, causes honeybee colony collapse disorder in the United States. The delivery of virus genomes into host cells is necessary for the initiation of infection. Here we present a structural cryo-electron microscopy analysis of IAPV particles induced to release their genomes. We show that genome release is not preceded by an expansion of IAPV virions as in the case of related picornaviruses that infect vertebrates. Furthermore, minor capsid proteins detach from the capsid upon genome release. The genome leaves behind empty particles that have compact protein shells. Copyright © 2017 Mullapudi et al.
Functional RNA elements in the dengue virus genome.

PubMed

Gebhard, Leopoldo G; Filomatori, Claudia V; Gamarnik, Andrea V

2011-09-01

Dengue virus (DENV) genome amplification is a process that involves the viral RNA, cellular and viral proteins, and a complex architecture of cellular membranes. The viral RNA is not a passive template during this process; it plays an active role providing RNA signals that act as promoters, enhancers and/or silencers of the replication process. RNA elements that modulate RNA replication were found at the 5' and 3' UTRs and within the viral coding sequence. The promoter for DENV RNA synthesis is a large stem loop structure located at the 5' end of the genome. This structure specifically interacts with the viral polymerase NS5 and promotes RNA synthesis at the 3' end of a circularized genome. The circular conformation of the viral genome is mediated by long range RNA-RNA interactions that span thousands of nucleotides. Recent studies have provided new information about the requirement of alternative, mutually exclusive, structures in the viral RNA, highlighting the idea that the viral genome is flexible and exists in different conformations. In this article, we describe elements in the promoter SLA and other RNA signals involved in NS5 polymerase binding and activity, and provide new ideas of how dynamic secondary and tertiary structures of the viral RNA participate in the viral life cycle.
Functions of the 3′ and 5′ genome RNA regions of members of the genus Flavivirus

PubMed Central

Brinton, Margo A.; Basu, Mausumi

2015-01-01

The positive sense genomes of members of the genus Flavivirus in the family Flaviviridae are ~11 kb nts in length and have a 5′ type I cap but no 3′ poly A. The 5′ and 3′ terminal regions contain short conserved sequences that are proposed to be repeated remnants of an ancient sequence. However, the functions of most of these conserved sequences have not yet been determined. The terminal regions of the genome also contain multiple conserved RNA structures. Functional data for many of these structures has been obtained. Three sets of complementary 3′ and 5′ terminal region sequences, some of which are located in conserved RNA structures, interact to form a panhandle structure that is required for initiation of minus strand RNA synthesis with the 5′ terminal structure functioning as the promoter. How the switch from the terminal RNA structure base pairing to the long distance RNA-RNA interaction is triggered and regulated is not well understood but evidence suggests involvement of a cell protein binding to three sites on the 3′ terminal RNA structures and a cis-acting metastable 3′ RNA element in the 3′ terminal structure. Cell proteins may also be involved in facilitating exponential replication of nascent genomic RNA within replication vesicles at later times of infection cycle. Other conserved RNA structures and/or sequences in the 5′ and 3′ terminal regions have been proposed to regulate genome translation. Additional functions of the 5′ and 3′ terminal sequences have also been reported. PMID:25683510
Center for Cancer Genomics | Office of Cancer Genomics

Cancer.gov

The Center for Cancer Genomics (CCG) was established to unify the National Cancer Institute's activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approaches, CCG aims to accelerate structural, functional and computational research to explore cancer mechanisms, discover new cancer targets, and develop new therapeutics.
The complete mitochondrial genome of Arctic Calanus hyperboreus (Copepoda, Calanoida) reveals characteristic patterns in calanoid mitochondrial genome.

PubMed

Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu

2013-05-10

Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.
The complete genomic sequence of egg drop syndrome virus strain AAV-2.

PubMed

Jin, Q; Zeng, L; Yang, F; Li, M; Hou, Y

1999-12-01

In the search for the genome of egg drop syndrome virus (EDSV-76) Chinese strain AAV-2, part of restriction endonuclease physical map is analyzed, the complete genomic library is organized. On basis of this, the complete genome nucleotide sequences (32 838 bp in length, including terminal structures) are determined. The data analysis shows: compared with the other Adenoviruses, strain AAV-2 has more disparity on genomic structure and the distribution of open reading frame (ORF). There are no clear E1, E3 and E4 regions in AAV-2 genome. Two segments located at both ends of genome (1.1 kb and 8.3 kb in length respectively) have no homology with the other adenovirus genomes. In addition, strain AAV-2 genome lacks ORFs encoding ElA, pV and pIX, which are common ORFs encoding early, lately proteins in Adenovirus. This reveals differences between EDSA-76, the sole standard strain of group III Avian Adenoviruses, and the other Avian Adenoviruses for the first time. It will help the search for Avian Adenovirus and will also help the search of all Adenoviruses.
The SGC beyond structural genomics: redefining the role of 3D structures by coupling genomic stratification with fragment-based discovery.

PubMed

Bradley, Anthony R; Echalier, Aude; Fairhead, Michael; Strain-Damerell, Claire; Brennan, Paul; Bullock, Alex N; Burgess-Brown, Nicola A; Carpenter, Elisabeth P; Gileadi, Opher; Marsden, Brian D; Lee, Wen Hwa; Yue, Wyatt; Bountra, Chas; von Delft, Frank

2017-11-08

The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets. © 2017 The Author(s).
SL1 revisited: functional analysis of the structure and conformation of HIV-1 genome RNA.

PubMed

Sakuragi, Sayuri; Yokoyama, Masaru; Shioda, Tatsuo; Sato, Hironori; Sakuragi, Jun-Ichi

2016-11-11

The dimer initiation site/dimer linkage sequence (DIS/DLS) region of HIV is located on the 5' end of the viral genome and suggested to form complex secondary/tertiary structures. Within this structure, stem-loop 1 (SL1) is believed to be most important and an essential key to dimerization, since the sequence and predicted secondary structure of SL1 are highly stable and conserved among various virus subtypes. In particular, a six-base palindromic sequence is always present at the hairpin loop of SL1 and the formation of kissing-loop structure at this position between the two strands of genomic RNA is suggested to trigger dimerization. Although the higher-order structure model of SL1 is well accepted and perhaps even undoubted lately, there could be stillroom for consideration to depict the functional SL1 structure while in vivo (in virion or cell). In this study, we performed several analyses to identify the nucleotides and/or basepairing within SL1 which are necessary for HIV-1 genome dimerization, encapsidation, recombination and infectivity. We unexpectedly found that some nucleotides that are believed to contribute the formation of the stem do not impact dimerization or infectivity. On the other hand, we found that one G-C basepair involved in stem formation may serve as an alternative dimer interactive site. We also report on our further investigation of the roles of the palindromic sequences on viral replication. Collectively, we aim to assemble a more-comprehensive functional map of SL1 on the HIV-1 viral life cycle. We discovered several possibilities for a novel structure of SL1 in HIV-1 DLS. The newly proposed structure model suggested that the hairpin loop of SL1 appeared larger, and genome dimerization process might consist of more complicated mechanism than previously understood. Further investigations would be still required to fully understand the genome packaging and dimerization of HIV.
Bacterial genome reduction using the progressive clustering of deletions via yeast sexual cycling

DOE PAGES

Suzuki, Yo; Assad-Garcia, Nacyra; Kostylev, Maxim; ...

2015-02-05

The availability of genetically tractable organisms with simple genomes is critical for the rapid, systems-level understanding of basic biological processes. Mycoplasma bacteria, with the smallest known genomes among free-living cellular organisms, are ideal models for this purpose, but the natural versions of these cells have genome complexities still too great to offer a comprehensive view of a fundamental life form. Here in this paper we describe an efficient method for reducing genomes from these organisms by identifying individually deletable regions using transposon mutagenesis and progressively clustering deleted genomic segments using meiotic recombination between the bacterial genomes harbored in yeast. Mycoplasmalmore » genomes subjected to this process and transplanted into recipient cells yielded two mycoplasma strains. The first simultaneously lacked eight singly deletable regions of the genome, representing a total of 91 genes and ~10%of the original genome. The second strain lacked seven of the eight regions, representing 84 genes. Growth assay data revealed an absence of genetic interactions among the 91 genes under tested conditions. Despite predicted effects of the deletions on sugar metabolism and the proteome, growth rates were unaffected by the gene deletions in the seven-deletion strain. These results support the feasibility of using single-gene disruption data to design and construct viable genomes lacking multiple genes, paving the way toward genome minimization. The progressive clustering method is expected to be effective for the reorganization of any mega-sized DNA molecules cloned in yeast, facilitating the construction of designer genomes in microbes as well as genomic fragments for genetic engineering of higher eukaryotes.« less
Bacterial genome reduction using the progressive clustering of deletions via yeast sexual cycling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Suzuki, Yo; Assad-Garcia, Nacyra; Kostylev, Maxim

The availability of genetically tractable organisms with simple genomes is critical for the rapid, systems-level understanding of basic biological processes. Mycoplasma bacteria, with the smallest known genomes among free-living cellular organisms, are ideal models for this purpose, but the natural versions of these cells have genome complexities still too great to offer a comprehensive view of a fundamental life form. Here in this paper we describe an efficient method for reducing genomes from these organisms by identifying individually deletable regions using transposon mutagenesis and progressively clustering deleted genomic segments using meiotic recombination between the bacterial genomes harbored in yeast. Mycoplasmalmore » genomes subjected to this process and transplanted into recipient cells yielded two mycoplasma strains. The first simultaneously lacked eight singly deletable regions of the genome, representing a total of 91 genes and ~10%of the original genome. The second strain lacked seven of the eight regions, representing 84 genes. Growth assay data revealed an absence of genetic interactions among the 91 genes under tested conditions. Despite predicted effects of the deletions on sugar metabolism and the proteome, growth rates were unaffected by the gene deletions in the seven-deletion strain. These results support the feasibility of using single-gene disruption data to design and construct viable genomes lacking multiple genes, paving the way toward genome minimization. The progressive clustering method is expected to be effective for the reorganization of any mega-sized DNA molecules cloned in yeast, facilitating the construction of designer genomes in microbes as well as genomic fragments for genetic engineering of higher eukaryotes.« less
Development and characterization of 32 microsatellite loci in Genipa americana (Rubiaceae)1

PubMed Central

Manoel, Ricardo O.; Freitas, Miguel L. M.; Barreto, Mariana A.; Moraes, Mário L. T.; Souza, Anete P.; Sebbenn, Alexandre M.

2014-01-01

• Premise of the study: Microsatellite primers were developed for the tree species Genipa americana (Rubiaceae) for further population genetic studies. • Methods and Results: We identified 144 clones containing 65 repeat motifs from a genomic library enriched for (CT)8 and (GT)8 motifs. Primer pairs were developed for 32 microsatellite loci and validated in 40 individuals of two natural G. americana populations. Seventeen loci were polymorphic, revealing from three to seven alleles per locus. The observed and expected heterozygosities ranged from 0.24 to 1.00 and from 0.22 to 0.78, respectively. • Conclusions: The 17 primers identified as polymorphic loci are suitable to study the genetic diversity and structure, mating system, and gene flow in G. americana. PMID:25202610
The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial ‘mobilome’

PubMed Central

Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

2009-01-01

Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The ∼108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element ‘mobilome’. PMID:19840100
The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial 'mobilome'.

PubMed

Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

2009-11-01

Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The approximately 108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element 'mobilome'.
Structural genomics reveals EVE as a new ASCH/PUA-related domain

PubMed Central

Bertonati, Claudia; Punta, Marco; Fischer, Markus; Yachdav, Guy; Forouhar, Farhad; Zhou, Weihong; Kuzin, Alexander P.; Seetharaman, Jayaraman; Abashidze, Mariam; Ramelot, Theresa A.; Kennedy, Michael A.; Cort, John R.; Belachew, Adam; Hunt, John F.; Tong, Liang; Montelione, Gaetano T.; Rost, Burkhard

2014-01-01

Summary We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE. Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links. PMID:19191354

Structural Genomics Reveals EVE as a New ASCH/PUA-Related Domain

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertonati, C.; Punta, M; Fischer, M

2008-01-01

We report on several proteins recently solved by structural genomics consortia, in particular by the Northeast Structural Genomics consortium (NESG). The proteins considered in this study differ substantially in their sequences but they share a similar structural core, characterized by a pseudobarrel five-stranded beta sheet. This core corresponds to the PUA domain-like architecture in the SCOP database. By connecting sequence information with structural knowledge, we characterize a new subgroup of these proteins that we propose to be distinctly different from previously described PUA domain-like domains such as PUA proper or ASCH. We refer to these newly defined domains as EVE.more » Although EVE may have retained the ability of PUA domains to bind RNA, the available experimental and computational data suggests that both the details of its molecular function and its cellular function differ from those of other PUA domain-like domains. This study of EVE and its relatives illustrates how the combination of structure and genomics creates new insights by connecting a cornucopia of structures that map to the same evolutionary potential. Primary sequence information alone would have not been sufficient to reveal these evolutionary links.« less
The Yak genome database: an integrative database for studying yak biology and high-altitude adaption

PubMed Central

2012-01-01

Background The yak (Bos grunniens) is a long-haired bovine that lives at high altitudes and is an important source of milk, meat, fiber and fuel. The recent sequencing, assembly and annotation of its genome are expected to further our understanding of the means by which it has adapted to life at high altitudes and its ecologically important traits. Description The Yak Genome Database (YGD) is an internet-based resource that provides access to genomic sequence data and predicted functional information concerning the genes and proteins of Bos grunniens. The curated data stored in the YGD includes genome sequences, predicted genes and associated annotations, non-coding RNA sequences, transposable elements, single nucleotide variants, and three-way whole-genome alignments between human, cattle and yak. YGD offers useful searching and data mining tools, including the ability to search for genes by name or using function keywords as well as GBrowse genome browsers and/or BLAST servers, which can be used to visualize genome regions and identify similar sequences. Sequence data from the YGD can also be downloaded to perform local searches. Conclusions A new yak genome database (YGD) has been developed to facilitate studies on high-altitude adaption and bovine genomics. The database will be continuously updated to incorporate new information such as transcriptome data and population resequencing data. The YGD can be accessed at http://me.lzu.edu.cn/yak. PMID:23134687
Molecular Innovation in Ciliates with Complex Genome Rearrangements

NASA Astrophysics Data System (ADS)

Neme, R.; Landweber, L. F.

2017-07-01

We study molecular innovation in several ciliate species with unique massive genome rearrangements to understand how a radically distinct genome architecture can shape the process of acquiring new functions, genes and structures.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

PubMed

Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

2015-01-01

The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Translating the “Banana Genome” to Delineate Stress Resistance, Dwarfing, Parthenocarpy and Mechanisms of Fruit Ripening

PubMed Central

Dash, Prasanta K.; Rai, Rhitu

2016-01-01

Evolutionary frozen, genetically sterile and globally iconic fruit “Banana” remained untouched by the green revolution and, as of today, researchers face intrinsic impediments for its varietal improvement. Recently, this wonder crop entered the genomics era with decoding of structural genome of double haploid Pahang (AA genome constitution) genotype of Musa acuminata. Its complex genome decoded by hybrid sequencing strategies revealed panoply of genes and transcription factors involved in the process of sucrose conversion that imparts sweetness to its fruit. Historically, banana has faced the wrath of pandemic bacterial, fungal, and viral diseases and multitude of abiotic stresses that has ruined the livelihood of small/marginal farmers’ and destroyed commercial plantations. Decoding structural genome of this climacteric fruit has given impetus to a deeper understanding of the repertoire of genes involved in disease resistance, understanding the mechanism of dwarfing to develop an ideal plant type, unraveling the process of parthenocarpy, and fruit ripening for better fruit quality. Further, injunction of comparative genomics will usher in integration of information from its decoded genome and other monocots into field applications in banana related but not limited to yield enhancement, food security, livelihood assurance, and energy sustainability. In this mini review, we discuss pre- and post-genomic discoveries and highlight accomplishments in structural genomics, genetic engineering and forward genetic accomplishments with an aim to target genes and transcription factors for translational research in banana. PMID:27833619
Organizational heterogeneity of vertebrate genomes.

PubMed

Frenkel, Svetlana; Kirzhner, Valery; Korol, Abraham

2012-01-01

Genomes of higher eukaryotes are mosaics of segments with various structural, functional, and evolutionary properties. The availability of whole-genome sequences allows the investigation of their structure as "texts" using different statistical and computational methods. One such method, referred to as Compositional Spectra (CS) analysis, is based on scoring the occurrences of fixed-length oligonucleotides (k-mers) in the target DNA sequence. CS analysis allows generating species- or region-specific characteristics of the genome, regardless of their length and the presence of coding DNA. In this study, we consider the heterogeneity of vertebrate genomes as a joint effect of regional variation in sequence organization superimposed on the differences in nucleotide composition. We estimated compositional and organizational heterogeneity of genome and chromosome sequences separately and found that both heterogeneity types vary widely among genomes as well as among chromosomes in all investigated taxonomic groups. The high correspondence of heterogeneity scores obtained on three genome fractions, coding, repetitive, and the remaining part of the noncoding DNA (the genome dark matter--GDM) allows the assumption that CS-heterogeneity may have functional relevance to genome regulation. Of special interest for such interpretation is the fact that natural GDM sequences display the highest deviation from the corresponding reshuffled sequences.
SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption.

PubMed

Ho, Michelle L; Adler, Benjamin A; Torre, Michael L; Silberg, Jonathan J; Suh, Junghae

2013-12-20

Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions.
SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption

PubMed Central

Ho, Michelle L.; Adler, Benjamin A.; Torre, Michael L.; Silberg, Jonathan J.; Suh, Junghae

2013-01-01

Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications, but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions. PMID:23899192
Phytochemical genomics--a new trend.

PubMed

Saito, Kazuki

2013-06-01

Phytochemical genomics is a recently emerging field, which investigates the genomic basis of the synthesis and function of phytochemicals (plant metabolites), particularly based on advanced metabolomics. The chemical diversity of the model plant Arabidopsis thaliana is larger than previously expected, and the gene-to-metabolite correlations have been elucidated mostly by an integrated analysis of transcriptomes and metabolomes. For example, most genes involved in the biosynthesis of flavonoids in Arabidopsis have been characterized by this method. A similar approach has been applied to the functional genomics for production of phytochemicals in crops and medicinal plants. Great promise is seen in metabolic quantitative loci analysis in major crops such as rice and tomato, and identification of novel genes involved in the biosynthesis of bioactive specialized metabolites in medicinal plants. Copyright © 2013 The Author. Published by Elsevier Ltd.. All rights reserved.
[Genomics basis of Arthrobacter spp. environmental adaptability– A review].

PubMed

Zhang, Xinjian; Zhang, Guangzhi; Yang, Hetong

2016-04-04

Arthrobacter species are found ecologically diverse and can survive in various environments. Many strains of these species have metabolic versatility and can degrade many environmental pollutants. Arthrobacter species are thought to play important roles in catabolism of environmental pollutants in nature. In recent years, the genomes of many Arthrobacter strains have been sequenced, which provides comprehensive information to clarify the molecular mechanisms related to environmental adaptability of Arthrobacter species. These genomics findings revealed several features that are commonly observed in Arthrobacter strains allowing for survival under stressful conditions. These include an array of genes associated with sigma factors and responses to oxidative, osmotic, starvation and temperature stresses. The genomics basis of their environmental adaptability are reviewed, which is expected to provide useful information for applying Arthrobacter strains in pollution remediation and shed some light on other bacterial environmental adaptability researches.
Fast and reliable prediction of domain-peptide binding affinity using coarse-grained structure models.

PubMed

Tian, Feifei; Tan, Rui; Guo, Tailin; Zhou, Peng; Yang, Li

2013-07-01

Domain-peptide recognition and interaction are fundamentally important for eukaryotic signaling and regulatory networks. It is thus essential to quantitatively infer the binding stability and specificity of such interaction based upon large-scale but low-accurate complex structure models which could be readily obtained from sophisticated molecular modeling procedure. In the present study, a new method is described for the fast and reliable prediction of domain-peptide binding affinity with coarse-grained structure models. This method is designed to tolerate strong random noises involved in domain-peptide complex structures and uses statistical modeling approach to eliminate systematic bias associated with a group of investigated samples. As a paradigm, this method was employed to model and predict the binding behavior of various peptides to four evolutionarily unrelated peptide-recognition domains (PRDs), i.e. human amph SH3, human nherf PDZ, yeast syh GYF and yeast bmh 14-3-3, and moreover, we explored the molecular mechanism and biological implication underlying the binding of cognate and noncognate peptide ligands to their domain receptors. It is expected that the newly proposed method could be further used to perform genome-wide inference of domain-peptide binding at three-dimensional structure level. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

PubMed

Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

2011-11-01

Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.
Surface-enhanced Raman spectroscopy of genomic DNA from in vitro grown tomato (Lycopersicon esculentum Mill.) cultivars before and after plant cryopreservation.

PubMed

Muntean, Cristina M; Leopold, Nicolae; Tripon, Carmen; Coste, Ana; Halmagyi, Adela

2015-06-05

In this work the surface-enhanced Raman scattering (SERS) spectra of five genomic DNAs from non-cryopreserved control tomato plants (Lycopersicon esculentum Mill. cultivars Siriana, Darsirius, Kristin, Pontica and Capriciu) respectively, have been analyzed in the wavenumber range 400-1800 cm(-1). Structural changes induced in genomic DNAs upon cryopreservation were discussed in detail for four of the above mentioned tomato cultivars. The surface-enhanced Raman vibrational modes for each of these cases, spectroscopic band assignments and structural interpretations of genomic DNAs are reported. We have found, that DNA isolated from Siriana cultivar leaf tissues suffers the weakest structural changes upon cryogenic storage of tomato shoot apices. On the contrary, genomic DNA extracted from Pontica cultivar is the most responsive system to cryopreservation process. Particularly, both C2'-endo-anti and C3'-endo-anti conformations have been detected. As a general observation, the wavenumber range 1511-1652 cm(-1), being due to dA, dG and dT residues seems to be influenced by cryopreservation process. These changes could reflect unstacking of DNA bases. However, not significant structural changes of genomic DNAs from Siriana, Darsirius and Kristin have been found upon cryopreservation process of tomato cultivars. Based on this work, specific plant DNA-ligand interactions or accurate local structure of DNA in the proximity of a metallic surface, might be further investigated using surface-enhanced Raman spectroscopy. Copyright © 2015 Elsevier B.V. All rights reserved.
Surface-enhanced Raman spectroscopy of genomic DNA from in vitro grown tomato (Lycopersicon esculentum Mill.) cultivars before and after plant cryopreservation

NASA Astrophysics Data System (ADS)

Muntean, Cristina M.; Leopold, Nicolae; Tripon, Carmen; Coste, Ana; Halmagyi, Adela

2015-06-01

In this work the surface-enhanced Raman scattering (SERS) spectra of five genomic DNAs from non-cryopreserved control tomato plants (Lycopersicon esculentum Mill. cultivars Siriana, Darsirius, Kristin, Pontica and Capriciu) respectively, have been analyzed in the wavenumber range 400-1800 cm-1. Structural changes induced in genomic DNAs upon cryopreservation were discussed in detail for four of the above mentioned tomato cultivars. The surface-enhanced Raman vibrational modes for each of these cases, spectroscopic band assignments and structural interpretations of genomic DNAs are reported. We have found, that DNA isolated from Siriana cultivar leaf tissues suffers the weakest structural changes upon cryogenic storage of tomato shoot apices. On the contrary, genomic DNA extracted from Pontica cultivar is the most responsive system to cryopreservation process. Particularly, both C2‧-endo-anti and C3'-endo-anti conformations have been detected. As a general observation, the wavenumber range 1511-1652 cm-1, being due to dA, dG and dT residues seems to be influenced by cryopreservation process. These changes could reflect unstacking of DNA bases. However, not significant structural changes of genomic DNAs from Siriana, Darsirius and Kristin have been found upon cryopreservation process of tomato cultivars. Based on this work, specific plant DNA-ligand interactions or accurate local structure of DNA in the proximity of a metallic surface, might be further investigated using surface-enhanced Raman spectroscopy.
Genome organization during the cell cycle: unity in division.

PubMed

Golloshi, Rosela; Sanders, Jacob T; McCord, Rachel Patton

2017-09-01

During the cell cycle, the genome must undergo dramatic changes in structure, from a decondensed, yet highly organized interphase structure to a condensed, generic mitotic chromosome and then back again. For faithful cell division, the genome must be replicated and chromosomes and sister chromatids physically segregated from one another. Throughout these processes, there is feedback and tension between the information-storing role and the physical properties of chromosomes. With a combination of recent techniques in fluorescence microscopy, chromosome conformation capture (Hi-C), biophysical experiments, and computational modeling, we can now attribute mechanisms to many long-observed features of chromosome structure changes during cell division. Apparent conflicts that arise when integrating the concepts from these different proposed mechanisms emphasize that orchestrating chromosome organization during cell division requires a complex system of factors rather than a simple pathway. Cell division is both essential for and threatening to proper genome organization. As interphase three-dimensional (3D) genome structure is quite static at a global level, cell division provides an important window of opportunity to make substantial changes in 3D genome organization in daughter cells, allowing for proper differentiation and development. Mistakes in the process of chromosome condensation or rebuilding the structure after mitosis can lead to diseases such as cancer, premature aging, and neurodegeneration. WIREs Syst Biol Med 2017, 9:e1389. doi: 10.1002/wsbm.1389 For further resources related to this article, please visit the WIREs website. © 2017 Wiley Periodicals, Inc.
Definition of a high-affinity Gag recognition structure mediating packaging of a retroviral RNA genome

PubMed Central

Gherghe, Cristina; Lombo, Tania; Leonard, Christopher W.; Datta, Siddhartha A. K.; Bess, Julian W.; Gorelick, Robert J.; Rein, Alan; Weeks, Kevin M.

2010-01-01

All retroviral genomic RNAs contain a cis-acting packaging signal by which dimeric genomes are selectively packaged into nascent virions. However, it is not understood how Gag (the viral structural protein) interacts with these signals to package the genome with high selectivity. We probed the structure of murine leukemia virus RNA inside virus particles using SHAPE, a high-throughput RNA structure analysis technology. These experiments showed that NC (the nucleic acid binding domain derived from Gag) binds within the virus to the sequence UCUG-UR-UCUG. Recombinant Gag and NC proteins bound to this same RNA sequence in dimeric RNA in vitro; in all cases, interactions were strongest with the first U and final G in each UCUG element. The RNA structural context is critical: High-affinity binding requires base-paired regions flanking this motif, and two UCUG-UR-UCUG motifs are specifically exposed in the viral RNA dimer. Mutating the guanosine residues in these two motifs—only four nucleotides per genomic RNA—reduced packaging 100-fold, comparable to the level of nonspecific packaging. These results thus explain the selective packaging of dimeric RNA. This paradigm has implications for RNA recognition in general, illustrating how local context and RNA structure can create information-rich recognition signals from simple single-stranded sequence elements in large RNAs. PMID:20974908
Applications of the 1000 Genomes Project resources

PubMed Central

Zheng-Bradley, Xiangqun

2017-01-01

Abstract The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. PMID:27436001
Virus-like attachment sites as structural landmarks of plants retrotransposons.

PubMed

Ochoa Cruz, Edgar Andres; Cruz, Guilherme Marcello Queiroga; Vieira, Andréia Prata; Van Sluys, Marie-Anne

2016-01-01

The genomic data available nowadays has enabled the study of repetitive sequences and their relationship to viruses. Among them, long terminal repeat retrotransposons (LTR-RTs) are the largest component of most plant genomes, the Gypsy and Copia superfamilies being the most common. Recently it has been found that Del lineage, an LTR-RT of Gypsy superfamily, has putative virus-like attachment (vl-att) sites. This signature, originally described for retroviruses, is recognized by retroviral integrase conferring specificity to the integration process. Here we retrieved 26,092 putative complete LTR-RTs from 10 lineages found in 10 fully sequenced angiosperm genomes and found putative vl-att sites that are a conserved structural landmark across these genomes. Furthermore, we reveal that each plant genome has a distinguishable LTR-RT lineage amplification pattern that could be related to the vl-att sites diversity. We used these patterns to generate a specific quick-response (QR) code for each genome that could be used as a barcode of identification of plants in the future. The universal distribution of vl-att sites represents a new structural feature common to plant LTR-RTs and retroviruses. This is an important finding that expands the information about the structural similarity between LTR-RT and retroviruses. We speculate that the sequence diversity of vl-att sites could be important for the life cycle of retrotransposons, as it was shown for retroviruses. All the structural vl-att site signatures are strong candidates for further functional studies. Moreover, this is the first identification of specific LTR-RT content and their amplification patterns in a large dataset of LTR-RT lineages and angiosperm genomes. These distribution patterns could be used in the future with biotechnological identification purposes.
The genome of the Gulf pipefish enables understanding of evolutionary innovations.

PubMed

Small, C M; Bassham, S; Catchen, J; Amores, A; Fuiten, A M; Brown, R S; Jones, A G; Cresko, W A

2016-12-20

Evolutionary origins of derived morphologies ultimately stem from changes in protein structure, gene regulation, and gene content. A well-assembled, annotated reference genome is a central resource for pursuing these molecular phenomena underlying phenotypic evolution. We explored the genome of the Gulf pipefish (Syngnathus scovelli), which belongs to family Syngnathidae (pipefishes, seahorses, and seadragons). These fishes have dramatically derived bodies and a remarkable novelty among vertebrates, the male brood pouch. We produce a reference genome, condensed into chromosomes, for the Gulf pipefish. Gene losses and other changes have occurred in pipefish hox and dlx clusters and in the tbx and pitx gene families, candidate mechanisms for the evolution of syngnathid traits, including an elongated axis and the loss of ribs, pelvic fins, and teeth. We measure gene expression changes in pregnant versus non-pregnant brood pouch tissue and characterize the genomic organization of duplicated metalloprotease genes (patristacins) recruited into the function of this novel structure. Phylogenetic inference using ultraconserved sequences provides an alternative hypothesis for the relationship between orders Syngnathiformes and Scombriformes. Comparisons of chromosome structure among percomorphs show that chromosome number in a pipefish ancestor became reduced via chromosomal fusions. The collected findings from this first syngnathid reference genome open a window into the genomic underpinnings of highly derived morphologies, demonstrating that de novo production of high quality and useful reference genomes is within reach of even small research groups.
Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities.

PubMed

Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F

1997-07-01

The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.

Segregation distortion and genome-wide digenic interactions affect transmission of introgressed chromatin from wild cotton species.

PubMed

Chandnani, Rahul; Wang, Baohua; Draye, Xavier; Rainville, Lisa K; Auckland, Susan; Zhuang, Zhimin; Lubbers, Edward L; May, O Lloyd; Chee, Peng W; Paterson, Andrew H

2017-10-01

This study reports transmission genetics of chromosomal segments into Gossypium hirsutum from its most distant euploid relative, Gossypium mustelinum . Mutilocus interactions and structural rearrangements affect introgression and segregation of donor chromatin. Wild allotetraploid relatives of cotton are a rich source of genetic diversity that can be used in genetic improvement, but linkage drag and non-Mendelian transmission genetics are prevalent in interspecific crosses. These problems necessitate knowledge of transmission patterns of chromatin from wild donor species in cultivated recipient species. From an interspecific cross, Gossypium hirsutum × Gossypium mustelinum, we studied G. mustelinum (the most distant tetraploid relative of Upland cotton) allele retention in 35 BC 3 F 1 plants and segregation patterns in BC 3 F 2 populations totaling 3202 individuals, using 216 DNA marker loci. The average retention of donor alleles across BC 3 F 1 plants was higher than expected and the average frequency of G. mustelinum alleles in BC 3 F 2 segregating families was less than expected. Despite surprisingly high retention of G. mustelinum alleles in BC 3 F 1 , 46 genomic regions showed no introgression. Regions on chromosomes 3 and 15 lacking introgression were closely associated with possible small inversions previously reported. Nonlinear two-locus interactions are abundant among loci with single-locus segregation distortion, and among loci originating from one of the two subgenomes. Comparison of the present results with those of prior studies indicates different permeability of Upland cotton for donor chromatin from different allotetraploid relatives. Different contributions of subgenomes to two-locus interactions suggest different fates of subgenomes in the evolution of allotetraploid cottons. Transmission genetics of G. hirsutum × G. mustelinum crosses reveals allelic interactions, constraints on fixation and selection of donor alleles, and challenges with retention of introgressed chromatin for crop improvement.
Genomewide Association Studies for 50 Agronomic Traits in Peanut Using the ‘Reference Set’ Comprising 300 Genotypes from 48 Countries of the Semi-Arid Tropics of the World

PubMed Central

Pandey, Manish K.; Upadhyaya, Hari D.; Rathore, Abhishek; Vadez, Vincent; Sheshshayee, M. S.; Sriswathi, Manda; Govil, Mansee; Kumar, Ashish; Gowda, M. V. C.; Sharma, Shivali; Hamidou, Falalou; Kumar, V. Anil; Khera, Pawan; Bhat, Ramesh S.; Khan, Aamir W.; Singh, Sube; Li, Hongjie; Monyo, Emmanuel; Nadaf, H. L.; Mukri, Ganapati; Jackson, Scott A.; Guo, Baozhu; Liang, Xuanqiang; Varshney, Rajeev K.

2014-01-01

Peanut is an important and nutritious agricultural commodity and a livelihood of many small-holder farmers in the semi-arid tropics (SAT) of world which are facing serious production threats. Integration of genomics tools with on-going genetic improvement approaches is expected to facilitate accelerated development of improved cultivars. Therefore, high-resolution genotyping and multiple season phenotyping data for 50 important agronomic, disease and quality traits were generated on the ‘reference set’ of peanut. This study reports comprehensive analyses of allelic diversity, population structure, linkage disequilibrium (LD) decay and marker-trait association (MTA) in peanut. Distinctness of all the genotypes can be established by using either an unique allele detected by a single SSR or a combination of unique alleles by two or more than two SSR markers. As expected, DArT features (2.0 alleles/locus, 0.125 PIC) showed lower allele frequency and polymorphic information content (PIC) than SSRs (22.21 alleles /locus, 0.715 PIC). Both marker types clearly differentiated the genotypes of diploids from tetraploids. Multi-allelic SSRs identified three sub-groups (K = 3) while the LD simulation trend line based on squared-allele frequency correlations (r2) predicted LD decay of 15–20 cM in peanut genome. Detailed analysis identified a total of 524 highly significant MTAs (pvalue >2.1×10–6) with wide phenotypic variance (PV) range (5.81–90.09%) for 36 traits. These MTAs after validation may be deployed in improving biotic resistance, oil/ seed/ nutritional quality, drought tolerance related traits, and yield/ yield components. PMID:25140620
The genome architecture of the Collaborative Cross mouse genetic reference population.

PubMed

2012-02-01

The Collaborative Cross Consortium reports here on the development of a unique genetic resource population. The Collaborative Cross (CC) is a multiparental recombinant inbred panel derived from eight laboratory mouse inbred strains. Breeding of the CC lines was initiated at multiple international sites using mice from The Jackson Laboratory. Currently, this innovative project is breeding independent CC lines at the University of North Carolina (UNC), at Tel Aviv University (TAU), and at Geniad in Western Australia (GND). These institutions aim to make publicly available the completed CC lines and their genotypes and sequence information. We genotyped, and report here, results from 458 extant lines from UNC, TAU, and GND using a custom genotyping array with 7500 SNPs designed to be maximally informative in the CC and used a novel algorithm to infer inherited haplotypes directly from hybridization intensity patterns. We identified lines with breeding errors and cousin lines generated by splitting incipient lines into two or more cousin lines at early generations of inbreeding. We then characterized the genome architecture of 350 genetically independent CC lines. Results showed that founder haplotypes are inherited at the expected frequency, although we also consistently observed highly significant transmission ratio distortion at specific loci across all three populations. On chromosome 2, there is significant overrepresentation of WSB/EiJ alleles, and on chromosome X, there is a large deficit of CC lines with CAST/EiJ alleles. Linkage disequilibrium decays as expected and we saw no evidence of gametic disequilibrium in the CC population as a whole or in random subsets of the population. Gametic equilibrium in the CC population is in marked contrast to the gametic disequilibrium present in a large panel of classical inbred strains. Finally, we discuss access to the CC population and to the associated raw data describing the genetic structure of individual lines. Integration of rich phenotypic and genomic data over time and across a wide variety of fields will be vital to delivering on one of the key attributes of the CC, a common genetic reference platform for identifying causative variants and genetic networks determining traits in mammals.
High throughput platforms for structural genomics of integral membrane proteins.

PubMed

Mancia, Filippo; Love, James

2011-08-01

Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.
In situ structures of the genome and genome-delivery apparatus in a single-stranded RNA virus.

PubMed

Dai, Xinghong; Li, Zhihai; Lai, Mason; Shu, Sara; Du, Yushen; Zhou, Z Hong; Sun, Ren

2017-01-05

Packaging of the genome into a protein capsid and its subsequent delivery into a host cell are two fundamental processes in the life cycle of a virus. Unlike double-stranded DNA viruses, which pump their genome into a preformed capsid, single-stranded RNA (ssRNA) viruses, such as bacteriophage MS2, co-assemble their capsid with the genome; however, the structural basis of this co-assembly is poorly understood. MS2 infects Escherichia coli via the host 'sex pilus' (F-pilus); it was the first fully sequenced organism and is a model system for studies of translational gene regulation, RNA-protein interactions, and RNA virus assembly. Its positive-sense ssRNA genome of 3,569 bases is enclosed in a capsid with one maturation protein monomer and 89 coat protein dimers arranged in a T = 3 icosahedral lattice. The maturation protein is responsible for attaching the virus to an F-pilus and delivering the viral genome into the host during infection, but how the genome is organized and delivered is not known. Here we describe the MS2 structure at 3.6 Å resolution, determined by electron-counting cryo-electron microscopy (cryoEM) and asymmetric reconstruction. We traced approximately 80% of the backbone of the viral genome, built atomic models for 16 RNA stem-loops, and identified three conserved motifs of RNA-coat protein interactions among 15 of these stem-loops with diverse sequences. The stem-loop at the 3' end of the genome interacts extensively with the maturation protein, which, with just a six-helix bundle and a six-stranded β-sheet, forms a genome-delivery apparatus and joins 89 coat protein dimers to form a capsid. This atomic description of genome-capsid interactions in a spherical ssRNA virus provides insight into genome delivery via the host sex pilus and mechanisms underlying ssRNA-capsid co-assembly, and inspires speculation about the links between nucleoprotein complexes and the origins of viruses.
Comprehensive molecular characterization of human colon and rectal cancer.

PubMed

2012-07-18

To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase ε (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.

PubMed

Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew

2012-12-20

The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Whole genome sequencing data and de novo draft assemblies for 66 teleost species

PubMed Central

Malmstrøm, Martin; Matschiner, Michael; Tørresen, Ole K.; Jakobsen, Kjetill S.; Jentoft, Sissel

2017-01-01

Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9–39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts. PMID:28094797
Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

PubMed

Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

2014-07-04

Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.
Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

PubMed

Li, Sanshu; Breaker, Ronald R

2017-10-13

With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.
Structure of RNA polymerase complex and genome within a dsRNA virus provides insights into the mechanisms of transcription and assembly.

PubMed

Wang, Xurong; Zhang, Fuxian; Su, Rui; Li, Xiaowu; Chen, Wenyuan; Chen, Qingxiu; Yang, Tao; Wang, Jiawei; Liu, Hongrong; Fang, Qin; Cheng, Lingpeng

2018-06-25

Most double-stranded RNA (dsRNA) viruses transcribe RNA plus strands within a common innermost capsid shell. This process requires coordinated efforts by RNA-dependent RNA polymerase (RdRp) together with other capsid proteins and genomic RNA. Here we report the near-atomic resolution structure of the RdRp protein VP2 in complex with its cofactor protein VP4 and genomic RNA within an aquareovirus capsid using 200-kV cryoelectron microscopy and symmetry-mismatch reconstruction. The structure of these capsid proteins enabled us to observe the elaborate nonicosahedral structure within the double-layered icosahedral capsid. Our structure shows that the RdRp complex is anchored at the inner surface of the capsid shell and interacts with genomic dsRNA and four of the five asymmetrically arranged N termini of the capsid shell proteins under the fivefold axis, implying roles for these N termini in virus assembly. The binding site of the RNA end at VP2 is different from the RNA cap binding site identified in the crystal structure of orthoreovirus RdRp λ3, although the structures of VP2 and λ3 are almost identical. A loop, which was thought to separate the RNA template and transcript, interacts with an apical domain of the capsid shell protein, suggesting a mechanism for regulating RdRp replication and transcription. A conserved nucleoside triphosphate binding site was localized in our RdRp cofactor protein VP4 structure, and interactions between the VP4 and the genomic RNA were identified.
Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

PubMed Central

Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

2018-01-01

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Finding the missing honey bee genes: lessons learned from a genome upgrade.

PubMed

Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

2014-01-30

The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
Finding the missing honey bee genes: lessons learned from a genome upgrade

PubMed Central

2014-01-01

Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. PMID:24479613
Collaborative Genomics Study Advances Precision Oncology

Cancer.gov

A collaborative study conducted by two Office of Cancer Genomics (OCG) initiatives highlights the importance of integrating structural and functional genomics programs to improve cancer therapies, and more specifically, contribute to precision oncology treatments for children.
Genomic sequencing of Pleistocene cave bears

DOE Office of Scientific and Technical Information (OSTI.GOV)

Noonan, James P.; Hofreiter, Michael; Smith, Doug

2005-04-01

Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less
Biological invasions, climate change and genomics

PubMed Central

Chown, Steven L; Hodgins, Kathryn A; Griffin, Philippa C; Oakeshott, John G; Byrne, Margaret; Hoffmann, Ary A

2015-01-01

The rate of biological invasions is expected to increase as the effects of climate change on biological communities become widespread. Climate change enhances habitat disturbance which facilitates the establishment of invasive species, which in turn provides opportunities for hybridization and introgression. These effects influence local biodiversity that can be tracked through genetic and genomic approaches. Metabarcoding and metagenomic approaches provide a way of monitoring some types of communities under climate change for the appearance of invasives. Introgression and hybridization can be followed by the analysis of entire genomes so that rapidly changing areas of the genome are identified and instances of genetic pollution monitored. Genomic markers enable accurate tracking of invasive species’ geographic origin well beyond what was previously possible. New genomic tools are promoting fresh insights into classic questions about invading organisms under climate change, such as the role of genetic variation, local adaptation and climate pre-adaptation in successful invasions. These tools are providing managers with often more effective means to identify potential threats, improve surveillance and assess impacts on communities. We provide a framework for the application of genomic techniques within a management context and also indicate some important limitations in what can be achieved. PMID:25667601
Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

PubMed

Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

2016-01-01

Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.
The proteome: structure, function and evolution

PubMed Central

Fleming, Keiran; Kelley, Lawrence A; Islam, Suhail A; MacCallum, Robert M; Muller, Arne; Pazos, Florencio; Sternberg, Michael J.E

2006-01-01

This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family. PMID:16524832
Plastid genome structure and loss of photosynthetic ability in the parasitic genus Cuscuta.

PubMed

Revill, Meredith J W; Stanley, Susan; Hibberd, Julian M

2005-09-01

The genus Cuscuta (dodder) is composed of parasitic plants, some species of which appear to be losing the ability to photosynthesize. A molecular phylogeny was constructed using 15 species of Cuscuta in order to assess whether changes in photosynthetic ability and alterations in structure of the plastid genome relate to phylogenetic position within the genus. The molecular phylogeny provides evidence for four major clades within Cuscuta. Although DNA blot analysis showed that Cuscuta species have smaller plastid genomes than tobacco, and that plastome size varied significantly even within one Cuscuta clade, dot blot analysis indicated that the dodders possess homologous sequence to 101 genes from the tobacco plastome. Evidence is provided for significant rates of DNA transfer from plastid to nucleus in Cuscuta. Size and structure of Cuscuta plastid genomes, as well as photosynthetic ability, appear to vary independently of position within the phylogeny, thus supporting the hypothesis that within Cuscuta photosynthetic ability and organization of the plastid genome are changing in an unco-ordinated manner.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.