prototype genome variation: Topics by Science.gov

Sample records for prototype genome variation

Full genome sequence of Rocio virus reveal substantial variations from the prototype Rocio virus SPH 34675 sequence.

PubMed

Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A

2018-01-01

Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).
The Second Report on the State of the World's Animal Genetic Resources for Food and Agriculture, Part 4, The State of the Art: Box 4A4: A digital enumeration method for collecting phenotypic data for genome association

USDA-ARS?s Scientific Manuscript database

Consistent data across animal populations are required to inform genomic science aimed at finding important adaptive genetic variations. The ADAPTMap Digital Phenotype Collection- Prototype Method will yield a new procedure to provide consistent phenotypic data by digital enumeration of categorical ...
Molecular Insights Into the Evolutionary Pathway of Vibrio cholerae O1 Atypical El Tor Variants

PubMed Central

Kim, Eun Jin; Lee, Dokyung; Moon, Se Hoon; Lee, Chan Hee; Kim, Sang Jun; Lee, Jae Hyun; Kim, Jae Ouk; Song, Manki; Das, Bhabatosh; Clemens, John D.; Pape, Jean William; Nair, G. Balakrish; Kim, Dong Wook

2014-01-01

Pandemic V. cholerae strains in the O1 serogroup have 2 biotypes: classical and El Tor. The classical biotype strains of the sixth pandemic, which encode the classical type cholera toxin (CT), have been replaced by El Tor biotype strains of the seventh pandemic. The prototype El Tor strains that produce biotype-specific cholera toxin are being replaced by atypical El Tor variants that harbor classical cholera toxin. Atypical El Tor strains are categorized into 2 groups, Wave 2 and Wave 3 strains, based on genomic variations and the CTX phage that they harbor. Whole-genome analysis of V. cholerae strains in the seventh cholera pandemic has demonstrated gradual changes in the genome of prototype and atypical El Tor strains, indicating that atypical strains arose from the prototype strains by replacing the CTX phages. We examined the molecular mechanisms that effected the emergence of El Tor strains with classical cholera toxin-carrying phage. We isolated an intermediary V. cholerae strain that carried two different CTX phages that encode El Tor and classical cholera toxin, respectively. We show here that the intermediary strain can be converted into various Wave 2 strains and can act as the source of the novel mosaic CTX phages. These results imply that the Wave 2 and Wave 3 strains may have been generated from such intermediary strains in nature. Prototype El Tor strains can become Wave 3 strains by excision of CTX-1 and re-equipping with the new CTX phages. Our data suggest that inter-chromosomal recombination between 2 types of CTX phages is possible when a host bacterial cell is infected by multiple CTX phages. Our study also provides molecular insights into population changes in V. cholerae in the absence of significant changes to the genome but by replacement of the CTX prophage that they harbor. PMID:25233006
Genomic and bioinformatics analyses of HAdV-4vac and HAdV-7vac, two human adenovirus (HAdV) strains that constituted original prophylaxis against HAdV-related acute respiratory disease, a reemerging epidemic disease.

PubMed

Purkayastha, Anjan; Su, Jing; McGraw, John; Ditty, Susan E; Hadfield, Ted L; Seto, Jason; Russell, Kevin L; Tibbetts, Clark; Seto, Donald

2005-07-01

Vaccine strains of human adenovirus serotypes 4 and 7 (HAdV-4vac and HAdV-7vac) have been used successfully to prevent adenovirus-related acute respiratory disease outbreaks. The genomes of these two vaccine strains have been sequenced, annotated, and compared with their prototype equivalents with the goals of understanding their genomes for molecular diagnostics applications, vaccine redevelopment, and HAdV pathoepidemiology. These reference genomes are archived in GenBank as HAdV-4vac (35,994 bp; AY594254) and HAdV-7vac (35,240 bp; AY594256). Bioinformatics and comparative whole-genome analyses with their recently reported and archived prototype genomes reveal six mismatches and four insertions-deletions (indels) between the HAdV-4 prototype and vaccine strains, in contrast to the 611 mismatches and 130 indels between the HAdV-7 prototype and vaccine strains. Annotation reveals that the HAdV-4vac and HAdV-7vac genomes contain 51 and 50 coding units, respectively. Neither vaccine strain appears to be attenuated for virulence based on bioinformatics analyses. There is evidence of genome recombination, as the inverted terminal repeat of HAdV-4vac is initially identical to that of species C whereas the prototype is identical to species B1. These vaccine reference sequences yield unique genome signatures for molecular diagnostics. As a molecular forensics application, these references identify the circulating and problematic 1950s era field strains as the original HAdV-4 prototype and the Greider prototype, from which the vaccines are derived. Thus, they are useful for genomic comparisons to current epidemic and reemerging field strains, as well as leading to an understanding of pathoepidemiology among the human adenoviruses.
Genomic and Bioinformatics Analyses of HAdV-4vac and HAdV-7vac, Two Human Adenovirus (HAdV) Strains That Constituted Original Prophylaxis against HAdV-Related Acute Respiratory Disease, a Reemerging Epidemic Disease

PubMed Central

Purkayastha, Anjan; Su, Jing; McGraw, John; Ditty, Susan E.; Hadfield, Ted L.; Seto, Jason; Russell, Kevin L.; Tibbetts, Clark; Seto, Donald

2005-01-01

Vaccine strains of human adenovirus serotypes 4 and 7 (HAdV-4vac and HAdV-7vac) have been used successfully to prevent adenovirus-related acute respiratory disease outbreaks. The genomes of these two vaccine strains have been sequenced, annotated, and compared with their prototype equivalents with the goals of understanding their genomes for molecular diagnostics applications, vaccine redevelopment, and HAdV pathoepidemiology. These reference genomes are archived in GenBank as HAdV-4vac (35,994 bp; AY594254) and HAdV-7vac (35,240 bp; AY594256). Bioinformatics and comparative whole-genome analyses with their recently reported and archived prototype genomes reveal six mismatches and four insertions-deletions (indels) between the HAdV-4 prototype and vaccine strains, in contrast to the 611 mismatches and 130 indels between the HAdV-7 prototype and vaccine strains. Annotation reveals that the HAdV-4vac and HAdV-7vac genomes contain 51 and 50 coding units, respectively. Neither vaccine strain appears to be attenuated for virulence based on bioinformatics analyses. There is evidence of genome recombination, as the inverted terminal repeat of HAdV-4vac is initially identical to that of species C whereas the prototype is identical to species B1. These vaccine reference sequences yield unique genome signatures for molecular diagnostics. As a molecular forensics application, these references identify the circulating and problematic 1950s era field strains as the original HAdV-4 prototype and the Greider prototype, from which the vaccines are derived. Thus, they are useful for genomic comparisons to current epidemic and reemerging field strains, as well as leading to an understanding of pathoepidemiology among the human adenoviruses. PMID:16000418
Abstracts of papers presented at the 8th workshop of the Virology Section of the Deutsche Gesellschaft für Hygiene und Mikro-biologie, Würzburg, March 17-19, 1983.

PubMed

1983-09-01

17 adenovirus strains were found to be antigenically related to prototype Ad 15 by neutralization. No relationship to Ad 15, but to Ad 9 could be detected by hemagglutination-inhibition; we therefore named them Ad 15/H9 intermediate strains. After analysis of the genome by five different restriction enzymes, the fragment patterns obtained deviated widely from the prototype Ad 15, but only slightly from Ad 9. Differences could also be observed among the variants. After digestion by five restriction enzymes, altogether six genome types could be established among the 17 intermediate strains. To map the variations on the genome of the 15/H9 strains, two methods were employed: the double digestion of the DNA and DNA fragments together with the determination of the terminal fragments made it possible to construct a physical map. The second method depends on a particularity of adenoviruses: the DNA is covalently linked with a 55 kD protein at the 5' terminus. After digestion of the DNA, which does contain this protein, the terminal DNA fragments do not migrate into the agarose gel; after an additional digestion with pronase B, they do migrate into the gel. Thus the terminal fragments were determined by comparing the fragment patterns with and without previous pronase B treatment.
GEM System: automatic prototyping of cell-wide metabolic pathway models from genomes.

PubMed

Arakawa, Kazuharu; Yamada, Yohei; Shinoda, Kosaku; Nakayama, Yoichi; Tomita, Masaru

2006-03-23

Successful realization of a "systems biology" approach to analyzing cells is a grand challenge for our understanding of life. However, current modeling approaches to cell simulation are labor-intensive, manual affairs, and therefore constitute a major bottleneck in the evolution of computational cell biology. We developed the Genome-based Modeling (GEM) System for the purpose of automatically prototyping simulation models of cell-wide metabolic pathways from genome sequences and other public biological information. Models generated by the GEM System include an entire Escherichia coli metabolism model comprising 968 reactions of 1195 metabolites, achieving 100% coverage when compared with the KEGG database, 92.38% with the EcoCyc database, and 95.06% with iJR904 genome-scale model. The GEM System prototypes qualitative models to reduce the labor-intensive tasks required for systems biology research. Models of over 90 bacterial genomes are available at our web site.
Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

PubMed

Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

2014-01-01

Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time.
Design and implementation of a CORBA-based genome mapping system prototype.

PubMed

Hu, J; Mungall, C; Nicholson, D; Archibald, A L

1998-01-01

CORBA (Common Object Request Broker Architecture), as an open standard, is considered to be a good solution for the development and deployment of applications in distributed heterogeneous environments. This technology can be applied in the bioinformatics area to enhance utilization, management and interoperation between biological resources. This paper investigates issues in developing CORBA applications for genome mapping information systems in the Internet environment with emphasis on database connectivity and graphical user interfaces. The design and implementation of a CORBA prototype for an animal genome mapping database are described. The prototype demonstration is available via: http://www.ri.bbsrc.ac.uk/ark_corba/. jian.hu@bbsrc.ac.uk
Genome Sequences of Human Adenovirus 14 Isolates from Mild Respiratory Cases and a Fatal Pneumonia, Isolated during 2006-2007 Epidemics in North America

DTIC Science & Technology

2010-01-01

We also compare the genome sequences of the recent isolates with those of the prototype HAdV-14 that circulated in Eurasia 30 years ago and the...closely related sequence of HAdV-11a, which has been circulating in southeast Asia. Conclusions: The data suggest that the currently circulating strain of...both mild and severe outbreaks. We also compare the genome sequences of the recent isolates with those of the prototype HAdV-14 that circulated in
SMART precision cancer medicine: a FHIR-based app to provide genomic information at the point of care.

PubMed

Warner, Jeremy L; Rioth, Matthew J; Mandl, Kenneth D; Mandel, Joshua C; Kreda, David A; Kohane, Isaac S; Carbone, Daniel; Oreto, Ross; Wang, Lucy; Zhu, Shilin; Yao, Heming; Alterovitz, Gil

2016-07-01

Precision cancer medicine (PCM) will require ready access to genomic data within the clinical workflow and tools to assist clinical interpretation and enable decisions. Since most electronic health record (EHR) systems do not yet provide such functionality, we developed an EHR-agnostic, clinico-genomic mobile app to demonstrate several features that will be needed for point-of-care conversations. Our prototype, called Substitutable Medical Applications and Reusable Technology (SMART)® PCM, visualizes genomic information in real time, comparing a patient's diagnosis-specific somatic gene mutations detected by PCR-based hotspot testing to a population-level set of comparable data. The initial prototype works for patient specimens with 0 or 1 detected mutation. Genomics extensions were created for the Health Level Seven® Fast Healthcare Interoperability Resources (FHIR)® standard; otherwise, the prototype is a normal SMART on FHIR app. The PCM prototype can rapidly present a visualization that compares a patient's somatic genomic alterations against a distribution built from more than 3000 patients, along with context-specific links to external knowledge bases. Initial evaluation by oncologists provided important feedback about the prototype's strengths and weaknesses. We added several requested enhancements and successfully demonstrated the app at the inaugural American Society of Clinical Oncology Interoperability Demonstration; we have also begun to expand visualization capabilities to include cancer specimens with multiple mutations. PCM is open-source software for clinicians to present the individual patient within the population-level spectrum of cancer somatic mutations. The app can be implemented on any SMART on FHIR-enabled EHRs, and future versions of PCM should be able to evolve in parallel with external knowledge bases. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
In silico method for modelling metabolism and gene product expression at genome scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lerman, Joshua A.; Hyduke, Daniel R.; Latif, Haythem

2012-07-03

Transcription and translation use raw materials and energy generated metabolically to create the macromolecular machinery responsible for all cellular functions, including metabolism. A biochemically accurate model of molecular biology and metabolism will facilitate comprehensive and quantitative computations of an organism's molecular constitution as a function of genetic and environmental parameters. Here we formulate a model of metabolism and macromolecular expression. Prototyping it using the simple microorganism Thermotoga maritima, we show our model accurately simulates variations in cellular composition and gene expression. Moreover, through in silico comparative transcriptomics, the model allows the discovery of new regulons and improving the genome andmore » transcription unit annotations. Our method presents a framework for investigating molecular biology and cellular physiology in silico and may allow quantitative interpretation of multi-omics data sets in the context of an integrated biochemical description of an organism.« less
A New Framework and Prototype Solution for Clinical Decision Support and Research in Genomics and Other Data-intensive Fields of Medicine.

PubMed

Evans, James P; Wilhelmsen, Kirk C; Berg, Jonathan; Schmitt, Charles P; Krishnamurthy, Ashok; Fecho, Karamarie; Ahalt, Stanley C

2016-01-01

In genomics and other fields, it is now possible to capture and store large amounts of data in electronic medical records (EMRs). However, it is not clear if the routine accumulation of massive amounts of (largely uninterpretable) data will yield any health benefits to patients. Nevertheless, the use of large-scale medical data is likely to grow. To meet emerging challenges and facilitate optimal use of genomic data, our institution initiated a comprehensive planning process that addresses the needs of all stakeholders (e.g., patients, families, healthcare providers, researchers, technical staff, administrators). Our experience with this process and a key genomics research project contributed to the proposed framework. We propose a two-pronged Genomic Clinical Decision Support System (CDSS) that encompasses the concept of the "Clinical Mendeliome" as a patient-centric list of genomic variants that are clinically actionable and introduces the concept of the "Archival Value Criterion" as a decision-making formalism that approximates the cost-effectiveness of capturing, storing, and curating genome-scale sequencing data. We describe a prototype Genomic CDSS that we developed as a first step toward implementation of the framework. The proposed framework and prototype solution are designed to address the perspectives of stakeholders, stimulate effective clinical use of genomic data, drive genomic research, and meet current and future needs. The framework also can be broadly applied to additional fields, including other '-omics' fields. We advocate for the creation of a Task Force on the Clinical Mendeliome, charged with defining Clinical Mendeliomes and drafting clinical guidelines for their use.
Rapid Prototyping of Microbial Cell Factories via Genome-scale Engineering

PubMed Central

Si, Tong; Xiao, Han; Zhao, Huimin

2014-01-01

Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. PMID:25450192
A New Framework and Prototype Solution for Clinical Decision Support and Research in Genomics and Other Data-intensive Fields of Medicine

PubMed Central

Evans, James P.; Wilhelmsen, Kirk C.; Berg, Jonathan; Schmitt, Charles P.; Krishnamurthy, Ashok; Fecho, Karamarie; Ahalt, Stanley C.

2016-01-01

Introduction: In genomics and other fields, it is now possible to capture and store large amounts of data in electronic medical records (EMRs). However, it is not clear if the routine accumulation of massive amounts of (largely uninterpretable) data will yield any health benefits to patients. Nevertheless, the use of large-scale medical data is likely to grow. To meet emerging challenges and facilitate optimal use of genomic data, our institution initiated a comprehensive planning process that addresses the needs of all stakeholders (e.g., patients, families, healthcare providers, researchers, technical staff, administrators). Our experience with this process and a key genomics research project contributed to the proposed framework. Framework: We propose a two-pronged Genomic Clinical Decision Support System (CDSS) that encompasses the concept of the “Clinical Mendeliome” as a patient-centric list of genomic variants that are clinically actionable and introduces the concept of the “Archival Value Criterion” as a decision-making formalism that approximates the cost-effectiveness of capturing, storing, and curating genome-scale sequencing data. We describe a prototype Genomic CDSS that we developed as a first step toward implementation of the framework. Conclusion: The proposed framework and prototype solution are designed to address the perspectives of stakeholders, stimulate effective clinical use of genomic data, drive genomic research, and meet current and future needs. The framework also can be broadly applied to additional fields, including other ‘-omics’ fields. We advocate for the creation of a Task Force on the Clinical Mendeliome, charged with defining Clinical Mendeliomes and drafting clinical guidelines for their use. PMID:27195307
Rapid prototyping of microbial cell factories via genome-scale engineering.

PubMed

Si, Tong; Xiao, Han; Zhao, Huimin

2015-11-15

Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. Copyright © 2014 Elsevier Inc. All rights reserved.
Intrapopulation Genome Size Variation in D. melanogaster Reflects Life History Variation and Plasticity

PubMed Central

Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.

2014-01-01

We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905
Complete Genome Analysis of an Enterovirus EV-B83 Isolated in China.

PubMed

Tang, Jingjing; Li, Qiongfen; Tian, Bingjun; Zhang, Jie; Li, Kai; Ding, Zhengrong; Lu, Lin

2016-07-12

Enterovirus B83 (EV-B83) is a recently identified member of enterovirus species B. It is a rarely reported serotype and up to date, only the complete genome sequence of the prototype strain from the United States is available. In this study, we describe the complete genomic characterization of an EV-B83 strain 246/YN/CHN/08HC isolated from a healthy child living in border region of Yunnan Province, China in 2008. Compared with the prototype strain, it had 79.6% similarity in the complete genome and 78.9% similarity in the VP1 coding region, reflecting the great genetic divergence among them. VP1-coding region alignment revealed it had 77.2-91.3% with other EV-B83 sequences available in GenBank. Similarity plot analysis revealed it had higher identity with several other EV-B serotypes than the EV-B83 prototype strain in the P2 and P3 coding region, suggesting multiple recombination events might have occurred. The great genetic divergence with previously isolated strains and the extremely rare isolation suggest this serotype has circulated at a low epidemic strength for many years. This is the first report of complete genome of EV-B83 in China.
Complete genome sequence of community-associated methicillin-resistant Staphylococcus aureus (strain USA400-0051), a prototype of the USA400 clone

PubMed Central

Côrtes, Marina Farrel; Costa, Maiana OC; Lima, Nicholas CB; Souza, Rangel C; Almeida, Luiz GP; Guedes, Luciane Prioli Ciapina; Vasconcelos, Ana TR; Nicolás, Marisa F; Figueiredo, Agnes MS

2017-01-01

Staphylococcus aureus subsp. aureus, commonly referred as S. aureus, is an important bacterial pathogen frequently involved in hospital- and community-acquired infections in humans, ranging from skin infections to more severe diseases such as pneumonia, bacteraemia, endocarditis, osteomyelitis, and disseminated infections. Here, we report the complete closed genome sequence of a community-acquired methicillin-resistant S. aureus strain, USA400-0051, which is a prototype of the USA400 clone. PMID:29091141
GenomicTools: a computational platform for developing high-throughput analytics in genomics.

PubMed

Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo

2012-01-15

Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.

Gene Fusion Markup Language: a prototype for exchanging gene fusion data.

PubMed

Kalyana-Sundaram, Shanker; Shanmugam, Achiraman; Chinnaiyan, Arul M

2012-10-16

An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
Variation block-based genomics method for crop plants.

PubMed

Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

2014-06-15

In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Whole-genome sequence of Escherichia coli serotype O157:H7 strain EDL932 (ATCC 43894)

USDA-ARS?s Scientific Manuscript database

Escherichia coli serotype O157:H7 EDL 933 is a ground beef isolate associated with a 1983 hemorrhagic colitis outbreak. Considered the prototype O157:H7 strain, its derived genome sequence is a standard reference strain for comparative genomic studies of Shiga toxin-producing E. coli (STEC). Here we...
Gene Fusion Markup Language: a prototype for exchanging gene fusion data

PubMed Central

2012-01-01

Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses. PMID:23072312
A survey of copy number variation in the porcine genome detected from whole-genome sequence

USDA-ARS?s Scientific Manuscript database

An important challenge to post-genomic biology is relating observed phenotypic variation to the underlying genotypic variation. Genome-wide association studies (GWAS) have made thousands of connections between single nucleotide polymorphisms (SNPs) and phenotypes, implicating regions of the genome t...
Genome Variation Map: a data repository of genome variations in BIG Data Center

PubMed Central

Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang

2018-01-01

Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. PMID:29069473
Hidden genetic variation in the germline genome of Tetrahymena thermophila.

PubMed

Dimond, K L; Zufall, R A

2016-06-01

Genome architecture varies greatly among eukaryotes. This diversity may profoundly affect the origin and maintenance of genetic variation within a population. Ciliates are microbial eukaryotes with unusual genome features, such as the separation of germline and somatic genomes within a single cell and amitotic division. These features have previously been proposed to increase the rate of molecular evolution in these species. Here, we assessed the fitness effects of genetic variation in the two genomes of natural isolates of the ciliate Tetrahymena thermophila. We find more extensive genetic variation in fitness in the transcriptionally silent germline genome than in the expressed somatic genome. Surprisingly, this variation is not primarily deleterious, but has both beneficial and deleterious effects. We conclude that Tetrahymena genome architecture allows for the maintenance of genetic variation that would otherwise be eliminated by selection. We consider the effect of selection on the two genomes and the impacts of reproductive strategies and the mechanism of sex determination on the structure of this variation. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.
77 FR 67381 - Government-Owned Inventions; Availability for Licensing

Federal Register 2010, 2011, 2012, 2013, 2014

2012-11-09

.... ``Computational and Experimental RNA Nanoparticle Design,'' in Automation in Genomics and Proteomics: An... and Experimental RNA Nanoparticle Design,'' in Automation in Genomics and Proteomics: An Engineering... Development Stage: Prototype Pre-clinical In vitro data available Inventors: Robert J. Crouch and Yutaka...
Variation in the Nucleotide Sequence of Cottontail Rabbit Papillomavirus a and b Subtypes Affects Wart Regression and Malignant Transformation and Level of Viral Replication in Domestic Rabbits

PubMed Central

Salmon, Jérôme; Nonnenmacher, Mathieu; Cazé, Sandrine; Flamant, Patricia; Croissant, Odile; Orth, Gérard; Breitburd, Françoise

2000-01-01

We previously reported the partial characterization of two cottontail rabbit papillomavirus (CRPV) subtypes with strikingly divergent E6 and E7 oncoproteins. We report now the complete nucleotide sequences of these subtypes, referred to as CRPVa4 (7,868 nucleotides) and CRPVb (7,867 nucleotides). The CRPVa4 and CRPVb genomes differed at 238 (3%) nucleotide positions, whereas CRPVa4 and the prototype CRPV differed by only 5 nucleotides. The most variable region (7% nucleotide divergence) included the long regulatory region (LRR) and the E6 and E7 genes. A mutation in the stop codon resulted in an 8-amino-acid-longer CRPVb E4 protein, and a nucleotide deletion reduced the coding capacity of the E5 gene from 101 to 25 amino acids. In domestic rabbits homozygous for a specific haplotype of the DRA and DQA genes of the major histocompatibility complex, warts induced by CRPVb DNA or a chimeric genome containing the CRPVb LRR/E6/E7 region showed an early regression, whereas warts induced by CRPVa4 or a chimeric genome containing the CRPVa4 LRR/E6/E7 region persisted and evolved into carcinomas. In contrast, most CRPVa, CRPVb, and chimeric CRPV DNA-induced warts showed no early regression in rabbits homozygous for another DRA-DQA haplotype. Little, if any, viral replication is usually observed in domestic rabbit warts. When warts induced by CRPVa and CRPVb virions and DNA were compared, the number of cells positive for viral DNA or capsid antigens was found to be greater by 1 order of magnitude for specimens induced by CRPVb. Thus, both sequence variation in the LRR/E6/E7 region and the genetic constitution of the host influence the expression of the oncogenic potential of CRPV. Furthermore, intratype variation may overcome to some extent the host restriction of CRPV replication in domestic rabbits. PMID:11044121
Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

USDA-ARS?s Scientific Manuscript database

Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...
Genome Variation Map: a data repository of genome variations in BIG Data Center.

PubMed

Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

2018-01-04

The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sesavirus: prototype of a new parvovirus genus in feces of a sea lion.

PubMed

Phan, Tung Gia; Gulland, Frances; Simeone, Claire; Deng, Xutao; Delwart, Eric

2015-02-01

We describe the nearly complete genome of a highly divergent parvovirus, we tentatively name Sesavirus, from the feces of a California sea lion pup (Zalophus californianus) suffering from malnutrition and pneumonia. The 5,049-base-long genome contained two major ORFs encoding a 553-aa nonstructural protein and a 965-aa structural protein which shared closest amino acid identities of 25 and 28 %, respectively, with members of the copiparvovirus genus known to infect pigs and cows. Given the low degree of similarity, Sesavirus might be considered as prototype for a new genus with a proposed name of Marinoparvovirus in the subfamily Parvovirinae.
Analysis of copy number variations among cattle breeds

USDA-ARS?s Scientific Manuscript database

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. Here we describe the first systematic and genome-wide analysis of copy number variations (CNVs) in the modern domesticated cattle using array comparative genomic hybridization (array CGH) and quanti...
Patterns of genome size variation in snapping shrimp.

PubMed

Jeffery, Nicholas W; Hultgren, Kristin; Chak, Solomon Tin Chi; Gregory, T Ryan; Rubenstein, Dustin R

2016-06-01

Although crustaceans vary extensively in genome size, little is known about how genome size may affect the ecology and evolution of species in this diverse group, in part due to the lack of large genome size datasets. Here we investigate interspecific, intraspecific, and intracolony variation in genome size in 39 species of Synalpheus shrimps, representing one of the largest genome size datasets for a single genus within crustaceans. We find that genome size ranges approximately 4-fold across Synalpheus with little phylogenetic signal, and is not related to body size. In a subset of these species, genome size is related to chromosome size, but not to chromosome number, suggesting that despite large genomes, these species are not polyploid. Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus duffyi genome size within the same colony.
Analysis of copy number variations reveals differences among cattle breeds

USDA-ARS?s Scientific Manuscript database

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. Here we describe the first systematic and genome-wide analysis of copy number variations (CNVs) in the modern domesticated cattle using array comparative genomic hybridization (array CGH) and quanti...
Studies on cattle genomic structural variation provide insights into ruminant speciation and adaptation

USDA-ARS?s Scientific Manuscript database

Genomic structural variations, including segmental duplications (SD) and copy number variations (CNV), contribute significantly to individual health and disease in primates and rodents. As a part of the bovine genome annotation effort, we performed the first genome-wide analysis of SD in cattle usin...
Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

PubMed

Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

2016-01-01

Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. Copyright © 2016 by the Genetics Society of America.
Epigenetic Inheritance across the Landscape.

PubMed

Whipple, Amy V; Holeski, Liza M

2016-01-01

The study of epigenomic variation at the landscape-level in plants may add important insight to studies of adaptive variation. A major goal of landscape genomic studies is to identify genomic regions contributing to adaptive variation across the landscape. Heritable variation in epigenetic marks, resulting in transgenerational plasticity, can influence fitness-related traits. Epigenetic marks are influenced by the genome, the environment, and their interaction, and can be inherited independently of the genome. Thus, epigenomic variation likely influences the heritability of many adaptive traits, but the extent of this influence remains largely unknown. Here, we summarize the relevance of epigenetic inheritance to ecological and evolutionary processes, and review the literature on landscape-level patterns of epigenetic variation. Landscape-level patterns of epigenomic variation in plants generally show greater levels of isolation by distance and isolation by environment then is found for the genome, but the causes of these patterns are not yet clear. Linkage between the environment and epigenomic variation has been clearly shown within a single generation, but demonstrating transgenerational inheritance requires more complex breeding and/or experimental designs. Transgenerational epigenetic variation may alter the interpretation of landscape genomic studies that rely upon phenotypic analyses, but should have less influence on landscape genomic approaches that rely upon outlier analyses or genome-environment associations. We suggest that multi-generation common garden experiments conducted across multiple environments will allow researchers to understand which parts of the epigenome are inherited, as well as to parse out the relative contribution of heritable epigenetic variation to the phenotype.
Genetics and Genomics of Single-Gene Cardiovascular Diseases: Common Hereditary Cardiomyopathies as Prototypes of Single-Gene Disorders

PubMed Central

Marian, Ali J.; van Rooij, Eva; Roberts, Robert

2016-01-01

This is the first of 2 review papers on genetics and genomics appearing as part of the series on “omics.” Genomics pertains to all components of an organism’s genes, whereas genetics involves analysis of a specific gene(s) in the context of heredity. The paper provides introductory comments, describes the basis of human genetic diversity, and addresses the phenotypic consequences of genetic variants. Rare variants with large effect sizes are responsible for single-gene disorders, whereas complex polygenic diseases are typically due to multiple genetic variants, each exerting a modest effect size. To illustrate the clinical implications of genetic variants with large effect sizes, 3 common forms of hereditary cardiomyopathies are discussed as prototypic examples of single-gene disorders, including their genetics, clinical manifestations, pathogenesis, and treatment. The genetic basis of complex traits is discussed in a separate paper. PMID:28007145
No evidence that sex and transposable elements drive genome size variation in evening primroses.

PubMed

Ågren, J Arvid; Greiner, Stephan; Johnson, Marc T J; Wright, Stephen I

2015-04-01

Genome size varies dramatically across species, but despite an abundance of attention there is little agreement on the relative contributions of selective and neutral processes in governing this variation. The rate of sex can potentially play an important role in genome size evolution because of its effect on the efficacy of selection and transmission of transposable elements (TEs). Here, we used a phylogenetic comparative approach and whole genome sequencing to investigate the contribution of sex and TE content to genome size variation in the evening primrose (Oenothera) genus. We determined genome size using flow cytometry for 30 species that vary in genetic system and find that variation in sexual/asexual reproduction cannot explain the almost twofold variation in genome size. Moreover, using whole genome sequences of three species of varying genome sizes and reproductive system, we found that genome size was not associated with TE abundance; instead the larger genomes had a higher abundance of simple sequence repeats. Although it has long been clear that sexual reproduction may affect various aspects of genome evolution in general and TE evolution in particular, it does not appear to have played a major role in genome size evolution in the evening primroses. © 2015 The Author(s).

A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes

PubMed Central

Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

2018-01-01

We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403
Minimal Absent Words in Four Human Genome Assemblies

PubMed Central

Garcia, Sara P.; Pinho, Armando J.

2011-01-01

Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. PMID:22220210
A Perfect Match Genomic Landscape Provides a Unified Framework for the Precise Detection of Variation in Natural and Synthetic Haploid Genomes.

PubMed

Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo

2018-04-01

We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. Copyright © 2018 by the Genetics Society of America.
Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

PubMed Central

2011-01-01

Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
RNA-Seq analysis of isolate- and growth phase-specific differences in the global transcriptomes of enteropathogenic Escherichia coli prototype isolates

PubMed Central

Hazen, Tracy H.; Daugherty, Sean C.; Shetty, Amol; Mahurkar, Anup A.; White, Owen; Kaper, James B.; Rasko, David A.

2015-01-01

Enteropathogenic Escherichia coli (EPEC) are a leading cause of diarrheal illness among infants in developing countries. E. coli isolates classified as typical EPEC are identified by the presence of the locus of enterocyte effacement (LEE) and the bundle-forming pilus (BFP), and absence of the Shiga-toxin genes, while the atypical EPEC also encode LEE but do not encode BFP or Shiga-toxin. Comparative genomic analyses have demonstrated that EPEC isolates belong to diverse evolutionary lineages and possess lineage- and isolate-specific genomic content. To investigate whether this genomic diversity results in significant differences in global gene expression, we used an RNA sequencing (RNA-Seq) approach to characterize the global transcriptomes of the prototype typical EPEC isolates E2348/69, B171, C581-05, and the prototype atypical EPEC isolate E110019. The global transcriptomes were characterized during laboratory growth in two different media and three different growth phases, as well as during adherence of the EPEC isolates to human cells using in vitro tissue culture assays. Comparison of the global transcriptomes during these conditions was used to identify isolate- and growth phase-specific differences in EPEC gene expression. These analyses resulted in the identification of genes that encode proteins involved in survival and metabolism that were coordinately expressed with virulence factors. These findings demonstrate there are isolate- and growth phase-specific differences in the global transcriptomes of EPEC prototype isolates, and highlight the utility of comparative transcriptomics for identifying additional factors that are directly or indirectly involved in EPEC pathogenesis. PMID:26124752
Epigenetic Inheritance across the Landscape

PubMed Central

Whipple, Amy V.; Holeski, Liza M.

2016-01-01

The study of epigenomic variation at the landscape-level in plants may add important insight to studies of adaptive variation. A major goal of landscape genomic studies is to identify genomic regions contributing to adaptive variation across the landscape. Heritable variation in epigenetic marks, resulting in transgenerational plasticity, can influence fitness-related traits. Epigenetic marks are influenced by the genome, the environment, and their interaction, and can be inherited independently of the genome. Thus, epigenomic variation likely influences the heritability of many adaptive traits, but the extent of this influence remains largely unknown. Here, we summarize the relevance of epigenetic inheritance to ecological and evolutionary processes, and review the literature on landscape-level patterns of epigenetic variation. Landscape-level patterns of epigenomic variation in plants generally show greater levels of isolation by distance and isolation by environment then is found for the genome, but the causes of these patterns are not yet clear. Linkage between the environment and epigenomic variation has been clearly shown within a single generation, but demonstrating transgenerational inheritance requires more complex breeding and/or experimental designs. Transgenerational epigenetic variation may alter the interpretation of landscape genomic studies that rely upon phenotypic analyses, but should have less influence on landscape genomic approaches that rely upon outlier analyses or genome–environment associations. We suggest that multi-generation common garden experiments conducted across multiple environments will allow researchers to understand which parts of the epigenome are inherited, as well as to parse out the relative contribution of heritable epigenetic variation to the phenotype. PMID:27826318
Genomic analysis reveals major determinants of cis-regulatory variation in Capsella grandiflora

PubMed Central

Steige, Kim A.; Laenen, Benjamin; Reimegård, Johan; Slotte, Tanja

2017-01-01

Understanding the causes of cis-regulatory variation is a long-standing aim in evolutionary biology. Although cis-regulatory variation has long been considered important for adaptation, we still have a limited understanding of the selective importance and genomic determinants of standing cis-regulatory variation. To address these questions, we studied the prevalence, genomic determinants, and selective forces shaping cis-regulatory variation in the outcrossing plant Capsella grandiflora. We first identified a set of 1,010 genes with common cis-regulatory variation using analyses of allele-specific expression (ASE). Population genomic analyses of whole-genome sequences from 32 individuals showed that genes with common cis-regulatory variation (i) are under weaker purifying selection and (ii) undergo less frequent positive selection than other genes. We further identified genomic determinants of cis-regulatory variation. Gene body methylation (gbM) was a major factor constraining cis-regulatory variation, whereas presence of nearby transposable elements (TEs) and tissue specificity of expression increased the odds of ASE. Our results suggest that most common cis-regulatory variation in C. grandiflora is under weak purifying selection, and that gene-specific functional constraints are more important for the maintenance of cis-regulatory variation than genome-scale variation in the intensity of selection. Our results agree with previous findings that suggest TE silencing affects nearby gene expression, and provide evidence for a link between gbM and cis-regulatory constraint, possibly reflecting greater dosage sensitivity of body-methylated genes. Given the extensive conservation of gbM in flowering plants, this suggests that gbM could be an important predictor of cis-regulatory variation in a wide range of plant species. PMID:28096395
Human Genome Sequencing in Health and Disease

PubMed Central

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320
Transposable element distribution, abundance and role in genome size variation in the genus Oryza.

PubMed

Zuccolo, Andrea; Sebastian, Aswathy; Talag, Jayson; Yu, Yeisoo; Kim, HyeRan; Collura, Kristi; Kudrna, Dave; Wing, Rod A

2007-08-29

The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop - rice (Oryza sativa [AA]). Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements) in shaping these genomes and in their contributing to genome size variation. We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys) account for a significant portion of the genome size variations present in the Oryza genus.
Ensembl variation resources

PubMed Central

2010-01-01

Background The maturing field of genomics is rapidly increasing the number of sequenced genomes and producing more information from those previously sequenced. Much of this additional information is variation data derived from sampling multiple individuals of a given species with the goal of discovering new variants and characterising the population frequencies of the variants that are already known. These data have immense value for many studies, including those designed to understand evolution and connect genotype to phenotype. Maximising the utility of the data requires that it be stored in an accessible manner that facilitates the integration of variation data with other genome resources such as gene annotation and comparative genomics. Description The Ensembl project provides comprehensive and integrated variation resources for a wide variety of chordate genomes. This paper provides a detailed description of the sources of data and the methods for creating the Ensembl variation databases. It also explores the utility of the information by explaining the range of query options available, from using interactive web displays, to online data mining tools and connecting directly to the data servers programmatically. It gives a good overview of the variation resources and future plans for expanding the variation data within Ensembl. Conclusions Variation data is an important key to understanding the functional and phenotypic differences between individuals. The development of new sequencing and genotyping technologies is greatly increasing the amount of variation data known for almost all genomes. The Ensembl variation resources are integrated into the Ensembl genome browser and provide a comprehensive way to access this data in the context of a widely used genome bioinformatics system. All Ensembl data is freely available at http://www.ensembl.org and from the public MySQL database server at ensembldb.ensembl.org. PMID:20459805
Genetic Variation in Cardiomyopathy and Cardiovascular Disorders.

PubMed

McNally, Elizabeth M; Puckelwartz, Megan J

2015-01-01

With the wider deployment of massively-parallel, next-generation sequencing, it is now possible to survey human genome data for research and clinical purposes. The reduced cost of producing short-read sequencing has now shifted the burden to data analysis. Analysis of genome sequencing remains challenged by the complexity of the human genome, including redundancy and the repetitive nature of genome elements and the large amount of variation in individual genomes. Public databases of human genome sequences greatly facilitate interpretation of common and rare genetic variation, although linking database sequence information to detailed clinical information is limited by privacy and practical issues. Genetic variation is a rich source of knowledge for cardiovascular disease because many, if not all, cardiovascular disorders are highly heritable. The role of rare genetic variation in predicting risk and complications of cardiovascular diseases has been well established for hypertrophic and dilated cardiomyopathy, where the number of genes that are linked to these disorders is growing. Bolstered by family data, where genetic variants segregate with disease, rare variation can be linked to specific genetic variation that offers profound diagnostic information. Understanding genetic variation in cardiomyopathy is likely to help stratify forms of heart failure and guide therapy. Ultimately, genetic variation may be amenable to gene correction and gene editing strategies.
Genome size evolution at the speciation level: the cryptic species complex Brachionus plicatilis (Rotifera).

PubMed

Stelzer, Claus-Peter; Riss, Simone; Stadler, Peter

2011-04-07

Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels--within and among genealogical species--and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex.
Genome size evolution at the speciation level: The cryptic species complex Brachionus plicatilis (Rotifera)

PubMed Central

2011-01-01

Background Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. Results We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels - within and among genealogical species - and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Conclusions Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex. PMID:21473744
The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes

PubMed Central

Liu, Shengyi; Liu, Yumei; Yang, Xinhua; Tong, Chaobo; Edwards, David; Parkin, Isobel A. P.; Zhao, Meixia; Ma, Jianxin; Yu, Jingyin; Huang, Shunmou; Wang, Xiyin; Wang, Junyi; Lu, Kun; Fang, Zhiyuan; Bancroft, Ian; Yang, Tae-Jin; Hu, Qiong; Wang, Xinfa; Yue, Zhen; Li, Haojie; Yang, Linfeng; Wu, Jian; Zhou, Qing; Wang, Wanxin; King, Graham J; Pires, J. Chris; Lu, Changxin; Wu, Zhangyan; Sampath, Perumal; Wang, Zhuo; Guo, Hui; Pan, Shengkai; Yang, Limei; Min, Jiumeng; Zhang, Dong; Jin, Dianchuan; Li, Wanshun; Belcram, Harry; Tu, Jinxing; Guan, Mei; Qi, Cunkou; Du, Dezhi; Li, Jiana; Jiang, Liangcai; Batley, Jacqueline; Sharpe, Andrew G; Park, Beom-Seok; Ruperao, Pradeep; Cheng, Feng; Waminal, Nomar Espinosa; Huang, Yin; Dong, Caihua; Wang, Li; Li, Jingping; Hu, Zhiyong; Zhuang, Mu; Huang, Yi; Huang, Junyan; Shi, Jiaqin; Mei, Desheng; Liu, Jing; Lee, Tae-Ho; Wang, Jinpeng; Jin, Huizhe; Li, Zaiyun; Li, Xun; Zhang, Jiefu; Xiao, Lu; Zhou, Yongming; Liu, Zhongsong; Liu, Xuequn; Qin, Rui; Tang, Xu; Liu, Wenbin; Wang, Yupeng; Zhang, Yangyong; Lee, Jonghoon; Kim, Hyun Hee; Denoeud, France; Xu, Xun; Liang, Xinming; Hua, Wei; Wang, Xiaowu; Wang, Jun; Chalhoub, Boulos; Paterson, Andrew H

2014-01-01

Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus. PMID:24852848
Norwalk virus: how infectious is it?

PubMed

Teunis, Peter F M; Moe, Christine L; Liu, Pengbo; Miller, Sara E; Lindesmith, Lisa; Baric, Ralph S; Le Pendu, Jacques; Calderon, Rebecca L

2008-08-01

Noroviruses are major agents of viral gastroenteritis worldwide. The infectivity of Norwalk virus, the prototype norovirus, has been studied in susceptible human volunteers. A new variant of the hit theory model of microbial infection was developed to estimate the variation in Norwalk virus infectivity, as well as the degree of virus aggregation, consistent with independent (electron microscopic) observations. Explicit modeling of viral aggregation allows us to express virus infectivity per single infectious unit (particle). Comparison of a primary and a secondary inoculum showed that passage through a human host does not change Norwalk virus infectivity. We estimate the average probability of infection for a single Norwalk virus particle to be close to 0.5, exceeding that reported for any other virus studied to date. Infected subjects had a dose-dependent probability of becoming ill, ranging from 0.1 (at a dose of 10(3) NV genomes) to 0.7 (at 10(8) virus genomes). A norovirus dose response model is important for understanding its transmission and essential for development of a quantitative risk model. Norwalk virus is a valuable model system to study virulence because genetic factors are known for both complete and partial protection; the latter can be quantitatively described as heterogeneity in dose response models.
Genomic Sequence Variation Markup Language (GSVML).

PubMed

Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

2010-02-01

With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as a potential data exchanging format for genomic sequence variation data exchange focusing on human health applications. The international standardization of GSVML is necessary, and is currently underway. GSVML can be applied to enhance the utilization of genomic sequence variation data worldwide by providing a communicable platform between clinical and research applications. Copyright 2009 Elsevier Ireland Ltd. All rights reserved.
In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

PubMed

Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

2015-01-01

The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
Chapter 9 - Vegetation succession modeling for the LANDFIRE Prototype Project

Treesearch

Donald Long; B. John (Jack) Losensky; Donald Bedunah

2006-01-01

One of the main objectives of the Landscape Fire and Resource Management Planning Tools Prototype Project, or LANDFIRE Prototype Project, was to determine departure of current vegetation conditions from the range and variation of conditions that existed during the historical era identified in the LANDFIRE guidelines as 1600-1900 A.D. (Keane and Rollins, Ch. 3). In...
Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor

USDA-ARS?s Scientific Manuscript database

Different individuals of the same species are generally thought to have very similar genomes. However, there is growing evidence that structural variation in the form of copy number variation (CNV) and presence-absence variation (PAV) can lead to variation in the genome content of individuals withi...
A new approach to configurable primary data collection.

PubMed

Stanek, J; Babkin, E; Zubov, M

2016-09-01

The formats, semantics and operational rules of data processing tasks in genomics (and health in general) are highly divergent and can rapidly change. In such an environment, the problem of consistent transformation and loading of heterogeneous input data to various target repositories becomes a critical success factor. The objective of the project was to design a new conceptual approach to configurable data transformation, de-identification, and submission of health and genomic data sets. Main motivation was to facilitate automated or human-driven data uploading, as well as consolidation of heterogeneous sources in large genomic or health projects. Modern methods of on-demand specialization of generic software components were applied. For specification of input-output data and required data collection activities, we propose a simple data model of flat tables as well as a domain-oriented graphical interface and portable representation of transformations in XML. Using such methods, the prototype of the Configurable Data Collection System (CDCS) was implemented in Java programming language with Swing graphical interfaces. The core logic of transformations was implemented as a library of reusable plugins. The solution is implemented as a software prototype for a configurable service-oriented system for semi-automatic data collection, transformation, sanitization and safe uploading to heterogeneous data repositories-CDCS. To address the dynamic nature of data schemas and data collection processes, the CDCS prototype facilitates interactive, user-driven configuration of the data collection process and extends basic functionality with a wide range of third-party plugins. Notably, our solution also allows for the reduction of manual data entry for data originally missing in the output data sets. First experiments and feedback from domain experts confirm the prototype is flexible, configurable and extensible; runs well on data owner's systems; and is not dependent on vendor's standards. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

Genomic Sequences of Australian Bluetongue Virus Prototype Serotypes Reveal Global Relationships and Possible Routes of Entry into Australia

PubMed Central

Bulach, Dieter M.; Amos-Ritchie, Rachel; Adams, Mathew M.; Walker, Peter J.; Weir, Richard

2012-01-01

Bluetongue virus (BTV) is transmitted by biting midges (Culicoides spp.). It causes disease mainly in sheep and occasionally in cattle and other species. BTV has spread into northern Europe, causing disease in sheep and cattle. The introduction of new serotypes, changes in vector species, and climate change have contributed to these changes. Ten BTV serotypes have been isolated in Australia without apparent associated disease. Simplified methods for preferential isolation of double-stranded RNA (dsRNA) and template preparation enabled high-throughput sequencing of the 10 genome segments of all Australian BTV prototype serotypes. Phylogenetic analysis reinforced the Western and Eastern topotypes previously characterized but revealed unique features of several Australian BTVs. Many of the Australian BTV genome segments (Seg-) were closely related, clustering together within the Eastern topotypes. A novel Australian topotype for Seg-5 (NS1) was identified, with taxa spread across several serotypes and over time. Seg-1, -2, -3, -4, -6, -7, -9, and -10 of BTV_2_AUS_2008 were most closely related to the cognate segments of viruses from Taiwan and Asia and not other Australian viruses, supporting the conclusion that BTV_2 entered Australia recently. The Australian BTV_15_AUS_1982 prototype was revealed to be unusual among the Australian BTV isolates, with Seg-3 and -8 distantly related to other BTV sequences from all serotypes. PMID:22514341
Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

PubMed

Caporale, Lynn Helena

2012-09-01

This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.
Child Development and Structural Variation in the Human Genome

ERIC Educational Resources Information Center

Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

2013-01-01

Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…
Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens.

PubMed

Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel

2016-02-03

Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to phenotypic variation for the three traits studied. Overall, the contribution of additive genetic variance to the total genetic variance was much greater than that of dominance variance. Our results show that all genomic regions are important for the prediction of the targeted traits, and the whole-genome approach was reaffirmed as the best tool for genome-enabled prediction of quantitative traits.
Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays

PubMed Central

Berg, Jeremy J.; Birchler, James A.; Grote, Mark N.; Lorant, Anne; Quezada, Juvenal

2018-01-01

While the vast majority of genome size variation in plants is due to differences in repetitive sequence, we know little about how selection acts on repeat content in natural populations. Here we investigate parallel changes in intraspecific genome size and repeat content of domesticated maize (Zea mays) landraces and their wild relative teosinte across altitudinal gradients in Mesoamerica and South America. We combine genotyping, low coverage whole-genome sequence data, and flow cytometry to test for evidence of selection on genome size and individual repeat abundance. We find that population structure alone cannot explain the observed variation, implying that clinal patterns of genome size are maintained by natural selection. Our modeling additionally provides evidence of selection on individual heterochromatic knob repeats, likely due to their large individual contribution to genome size. To better understand the phenotypes driving selection on genome size, we conducted a growth chamber experiment using a population of highland teosinte exhibiting extensive variation in genome size. We find weak support for a positive correlation between genome size and cell size, but stronger support for a negative correlation between genome size and the rate of cell production. Reanalyzing published data of cell counts in maize shoot apical meristems, we then identify a negative correlation between cell production rate and flowering time. Together, our data suggest a model in which variation in genome size is driven by natural selection on flowering time across altitudinal clines, connecting intraspecific variation in repetitive sequence to important differences in adaptive phenotypes. PMID:29746459
On the molecular mechanism of GC content variation among eubacterial genomes.

PubMed

Wu, Hao; Zhang, Zhang; Hu, Songnian; Yu, Jun

2012-01-10

As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.
Effects of Caffeine and Chlorogenic Acid on Propidium Iodide Accessibility to DNA: Consequences on Genome Size Evaluation in Coffee Tree

PubMed Central

NOIROT, M.; BARRE, P.; DUPERRAY, C.; LOUARN, J.; HAMON, S.

2003-01-01

Estimates of genome size using flow cytometry can be biased by the presence of cytosolic compounds, leading to pseudo‐intraspecific variation in genome size. Two important compounds present in coffee trees—caffeine and chlorogenic acid—modify accessibility of the dye propidium iodide to Petunia DNA, a species used as internal standard in our genome size evaluation. These compounds could be responsible for intraspecific variation in genome size since their contents vary between trees. They could also be implicated in environmental variations in genome size, such as those revealed when comparing the results of evaluations carried out on different dates on several genotypes. PMID:12876189
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

PubMed Central

Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo

2017-01-01

There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.

PubMed

Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo

2017-06-07

There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
Functional genomics of physiological plasticity and local adaptation in killifish.

PubMed

Whitehead, Andrew; Galvez, Fernando; Zhang, Shujun; Williams, Larissa M; Oleksiak, Marjorie F

2011-01-01

Evolutionary solutions to the physiological challenges of life in highly variable habitats can span the continuum from evolution of a cosmopolitan plastic phenotype to the evolution of locally adapted phenotypes. Killifish (Fundulus sp.) have evolved both highly plastic and locally adapted phenotypes within different selective contexts, providing a comparative system in which to explore the genomic underpinnings of physiological plasticity and adaptive variation. Importantly, extensive variation exists among populations and species for tolerance to a variety of stressors, and we exploit this variation in comparative studies to yield insights into the genomic basis of evolved phenotypic variation. Notably, species of Fundulus occupy the continuum of osmotic habitats from freshwater to marine and populations within Fundulus heteroclitus span far greater variation in pollution tolerance than across all species of fish. Here, we explore how transcriptome regulation underpins extreme physiological plasticity on osmotic shock and how genomic and transcriptomic variation is associated with locally evolved pollution tolerance. We show that F. heteroclitus quickly acclimate to extreme osmotic shock by mounting a dramatic rapid transcriptomic response including an early crisis control phase followed by a tissue remodeling phase involving many regulatory pathways. We also show that convergent evolution of locally adapted pollution tolerance involves complex patterns of gene expression and genome sequence variation, which is confounded with body-weight dependence for some genes. Similarly, exploiting the natural phenotypic variation associated with other established and emerging model organisms is likely to greatly accelerate the pace of discovery of the genomic basis of phenotypic variation.
Functional Genomics of Physiological Plasticity and Local Adaptation in Killifish

PubMed Central

Galvez, Fernando; Zhang, Shujun; Williams, Larissa M.; Oleksiak, Marjorie F.

2011-01-01

Evolutionary solutions to the physiological challenges of life in highly variable habitats can span the continuum from evolution of a cosmopolitan plastic phenotype to the evolution of locally adapted phenotypes. Killifish (Fundulus sp.) have evolved both highly plastic and locally adapted phenotypes within different selective contexts, providing a comparative system in which to explore the genomic underpinnings of physiological plasticity and adaptive variation. Importantly, extensive variation exists among populations and species for tolerance to a variety of stressors, and we exploit this variation in comparative studies to yield insights into the genomic basis of evolved phenotypic variation. Notably, species of Fundulus occupy the continuum of osmotic habitats from freshwater to marine and populations within Fundulus heteroclitus span far greater variation in pollution tolerance than across all species of fish. Here, we explore how transcriptome regulation underpins extreme physiological plasticity on osmotic shock and how genomic and transcriptomic variation is associated with locally evolved pollution tolerance. We show that F. heteroclitus quickly acclimate to extreme osmotic shock by mounting a dramatic rapid transcriptomic response including an early crisis control phase followed by a tissue remodeling phase involving many regulatory pathways. We also show that convergent evolution of locally adapted pollution tolerance involves complex patterns of gene expression and genome sequence variation, which is confounded with body-weight dependence for some genes. Similarly, exploiting the natural phenotypic variation associated with other established and emerging model organisms is likely to greatly accelerate the pace of discovery of the genomic basis of phenotypic variation. PMID:20581107
A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes

PubMed Central

Bergström, Anders; Simpson, Jared T.; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N.; Moses, Alan M.; Louis, Edward J.; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

2014-01-01

The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782
Multiple capacitors for natural genetic variation in Drosophila melanogaster.

PubMed

Takahashi, Kazuo H

2013-03-01

Cryptic genetic variation (CGV) or a standing genetic variation that is not ordinarily expressed as a phenotype is released when the robustness of organisms is impaired under environmental or genetic perturbations. Evolutionary capacitors modulate the amount of genetic variation exposed to natural selection and hidden cryptically; they have a fundamental effect on the evolvability of traits on evolutionary timescales. In this study, I have demonstrated the effects of multiple genomic regions of Drosophila melanogaster on CGV in wing shape. I examined the effects of 61 genomic deficiencies on quantitative and qualitative natural genetic variation in the wing shape of D. melanogaster. I have identified 10 genomic deficiencies that do not encompass a known candidate evolutionary capacitor, Hsp90, exposing natural CGV differently depending on the location of the deficiencies in the genome. Furthermore, five genomic deficiencies uncovered qualitative CGV in wing morphology. These findings suggest that CGV in wing shape of wild-type D. melanogaster is regulated by multiple capacitors with divergent functions. Future analysis of genes encompassed by these genomic regions would help elucidate novel capacitor genes and better understand the general features of capacitors regarding natural genetic variation. © 2012 Blackwell Publishing Ltd.
Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold

PubMed Central

Nijkamp, Jurgen F.; Pop, Mihai; Reinders, Marcel J. T.; de Ridder, Dick

2013-01-01

Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes. Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets. Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software Contact: d.deridder@tudelft.nl PMID:24058058
Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory.

PubMed

Crisan, Anamaria; McKee, Geoffrey; Munzner, Tamara; Gardy, Jennifer L

2018-01-01

Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics-including clinicians, laboratorians, epidemiologists, and researchers-is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB) genomic test results. We used Design Study Methodology-a human centered approach drawn from the information visualization domain-to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders' needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type). When we compared our four prototype reports against the existing design, we found that for the majority (86.7%) of design comparisons, participants preferred the alternative prototype designs over the existing version, and that both clinicians and non-clinicians expressed similar design preferences. Participants showed clearer design preferences when asked to compare individual design elements versus entire reports. Both the quantitative and qualitative data informed the design of a revised report, available online as a LaTeX template. We show how a human-centered design approach integrating quantitative and qualitative feedback can be used to design an alternative report for representing complex microbial genomic data. We suggest experimental and design guidelines to inform future design studies in the bioinformatics and microbial genomics domains, and suggest that this type of mixed-methods study is important to facilitate the successful translation of pathogen genomics in the clinic, not only for clinical reports but also more complex bioinformatics data visualization software.
GEMINI: Integrative Exploration of Genetic Variation and Genome Annotations

PubMed Central

Paila, Umadevi; Chapman, Brad A.; Kirchner, Rory; Quinlan, Aaron R.

2013-01-01

Modern DNA sequencing technologies enable geneticists to rapidly identify genetic variation among many human genomes. However, isolating the minority of variants underlying disease remains an important, yet formidable challenge for medical genetics. We have developed GEMINI (GEnome MINIng), a flexible software package for exploring all forms of human genetic variation. Unlike existing tools, GEMINI integrates genetic variation with a diverse and adaptable set of genome annotations (e.g., dbSNP, ENCODE, UCSC, ClinVar, KEGG) into a unified database to facilitate interpretation and data exploration. Whereas other methods provide an inflexible set of variant filters or prioritization methods, GEMINI allows researchers to compose complex queries based on sample genotypes, inheritance patterns, and both pre-installed and custom genome annotations. GEMINI also provides methods for ad hoc queries and data exploration, a simple programming interface for custom analyses that leverage the underlying database, and both command line and graphical tools for common analyses. We demonstrate GEMINI's utility for exploring variation in personal genomes and family based genetic studies, and illustrate its ability to scale to studies involving thousands of human samples. GEMINI is designed for reproducibility and flexibility and our goal is to provide researchers with a standard framework for medical genomics. PMID:23874191
Draft Genome Sequence of Escherichia coli MS499, Isolated from the Infected Uterus of a Postpartum Cow with Metritis

PubMed Central

Goldstone, Robert J.; Talbot, Richard; Schuberth, Hans-Joachim; Sandra, Olivier; Sheldon, I. Martin

2014-01-01

Specific Escherichia coli strains associated with bovine postpartum uterine infection have recently been described. Many recognized virulence factors are absent in these strains; therefore, to define a prototypic strain, we report here the genome sequence of E. coli isolate MS499 from a cow with the postpartum disease metritis. PMID:24994791
The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences

PubMed Central

Xue, Cheng; Raveendran, Muthuswamy; Harris, R. Alan; Fawcett, Gloria L.; Liu, Xiaoming; White, Simon; Dahdouli, Mahmoud; Rio Deiros, David; Below, Jennifer E.; Salerno, William; Cox, Laura; Fan, Guoping; Ferguson, Betsy; Horvath, Julie; Johnson, Zach; Kanthaswamy, Sree; Kubisch, H. Michael; Liu, Dahai; Platt, Michael; Smith, David G.; Sun, Binghua; Vallender, Eric J.; Wang, Feng; Wiseman, Roger W.; Chen, Rui; Muzny, Donna M.; Gibbs, Richard A.; Yu, Fuli; Rogers, Jeffrey

2016-01-01

Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease. PMID:27934697
Fast-Track Building.

ERIC Educational Resources Information Center

Dolan, Thomas G.

2002-01-01

Describes Clark County, Nevada's use of prototype school designs to respond to its rapidly growing school population. The purpose of the prototypes is to simplify designs so that schools can be built quickly and minimize the time and expense that comes with variations. (EV)
Genome-Wide Association Mapping and Genomic Prediction Elucidate the Genetic Architecture of Morphological Traits in Arabidopsis.

PubMed

Kooke, Rik; Kruijer, Willem; Bours, Ralph; Becker, Frank; Kuhn, André; van de Geest, Henri; Buntjer, Jaap; Doeswijk, Timo; Guerra, José; Bouwmeester, Harro; Vreugdenhil, Dick; Keurentjes, Joost J B

2016-04-01

Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified. © 2016 American Society of Plant Biologists. All Rights Reserved.

Maintenance of genetic diversity through plant-herbivore interactions

PubMed Central

Gloss, Andrew D.; Dittrich, Anna C. Nelson; Goldman-Huertas, Benjamin; Whiteman, Noah K.

2013-01-01

Identifying the factors governing the maintenance of genetic variation is a central challenge in evolutionary biology. New genomic data, methods and conceptual advances provide increasing evidence that balancing selection, mediated by antagonistic species interactions, maintains functionally-important genetic variation within species and natural populations. Because diverse interactions between plants and herbivorous insects dominate terrestrial communities, they provide excellent systems to address this hypothesis. Population genomic studies of Arabidopsis thaliana and its relatives suggest spatial variation in herbivory maintains adaptive genetic variation controlling defense phenotypes, both within and among populations. Conversely, inter-species variation in plant defenses promotes adaptive genetic variation in herbivores. Emerging genomic model herbivores of Arabidopsis could illuminate how genetic variation in herbivores and plants interact simultaneously. PMID:23834766
Global mapping of transposon location.

PubMed

Gabriel, Abram; Dapprich, Johannes; Kunkel, Mark; Gresham, David; Pratt, Stephen C; Dunham, Maitreya J

2006-12-15

Transposable genetic elements are ubiquitous, yet their presence or absence at any given position within a genome can vary between individual cells, tissues, or strains. Transposable elements have profound impacts on host genomes by altering gene expression, assisting in genomic rearrangements, causing insertional mutations, and serving as sources of phenotypic variation. Characterizing a genome's full complement of transposons requires whole genome sequencing, precluding simple studies of the impact of transposition on interindividual variation. Here, we describe a global mapping approach for identifying transposon locations in any genome, using a combination of transposon-specific DNA extraction and microarray-based comparative hybridization analysis. We use this approach to map the repertoire of endogenous transposons in different laboratory strains of Saccharomyces cerevisiae and demonstrate that transposons are a source of extensive genomic variation. We also apply this method to mapping bacterial transposon insertion sites in a yeast genomic library. This unique whole genome view of transposon location will facilitate our exploration of transposon dynamics, as well as defining bases for individual differences and adaptive potential.
ACTG: novel peptide mapping onto gene models.

PubMed

Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok

2017-04-15

In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
On the molecular mechanism of GC content variation among eubacterial genomes

PubMed Central

2012-01-01

Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years. Reviewers This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin. PMID:22230424
Identification of structural variation in mouse genomes.

PubMed

Keane, Thomas M; Wong, Kim; Adams, David J; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

2014-01-01

Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
Evidence-based design and evaluation of a whole genome sequencing clinical report for the reference microbiology laboratory

PubMed Central

Crisan, Anamaria; McKee, Geoffrey; Munzner, Tamara

2018-01-01

Background Microbial genome sequencing is now being routinely used in many clinical and public health laboratories. Understanding how to report complex genomic test results to stakeholders who may have varying familiarity with genomics—including clinicians, laboratorians, epidemiologists, and researchers—is critical to the successful and sustainable implementation of this new technology; however, there are no evidence-based guidelines for designing such a report in the pathogen genomics domain. Here, we describe an iterative, human-centered approach to creating a report template for communicating tuberculosis (TB) genomic test results. Methods We used Design Study Methodology—a human centered approach drawn from the information visualization domain—to redesign an existing clinical report. We used expert consults and an online questionnaire to discover various stakeholders’ needs around the types of data and tasks related to TB that they encounter in their daily workflow. We also evaluated their perceptions of and familiarity with genomic data, as well as its utility at various clinical decision points. These data shaped the design of multiple prototype reports that were compared against the existing report through a second online survey, with the resulting qualitative and quantitative data informing the final, redesigned, report. Results We recruited 78 participants, 65 of whom were clinicians, nurses, laboratorians, researchers, and epidemiologists involved in TB diagnosis, treatment, and/or surveillance. Our first survey indicated that participants were largely enthusiastic about genomic data, with the majority agreeing on its utility for certain TB diagnosis and treatment tasks and many reporting some confidence in their ability to interpret this type of data (between 58.8% and 94.1%, depending on the specific data type). When we compared our four prototype reports against the existing design, we found that for the majority (86.7%) of design comparisons, participants preferred the alternative prototype designs over the existing version, and that both clinicians and non-clinicians expressed similar design preferences. Participants showed clearer design preferences when asked to compare individual design elements versus entire reports. Both the quantitative and qualitative data informed the design of a revised report, available online as a LaTeX template. Conclusions We show how a human-centered design approach integrating quantitative and qualitative feedback can be used to design an alternative report for representing complex microbial genomic data. We suggest experimental and design guidelines to inform future design studies in the bioinformatics and microbial genomics domains, and suggest that this type of mixed-methods study is important to facilitate the successful translation of pathogen genomics in the clinic, not only for clinical reports but also more complex bioinformatics data visualization software. PMID:29340235
Using genomics to characterize evolutionary potential for conservation of wild populations

PubMed Central

Harrisson, Katherine A; Pavlova, Alexandra; Telonis-Scott, Marina; Sunnucks, Paul

2014-01-01

Genomics promises exciting advances towards the important conservation goal of maximizing evolutionary potential, notwithstanding associated challenges. Here, we explore some of the complexity of adaptation genetics and discuss the strengths and limitations of genomics as a tool for characterizing evolutionary potential in the context of conservation management. Many traits are polygenic and can be strongly influenced by minor differences in regulatory networks and by epigenetic variation not visible in DNA sequence. Much of this critical complexity is difficult to detect using methods commonly used to identify adaptive variation, and this needs appropriate consideration when planning genomic screens, and when basing management decisions on genomic data. When the genomic basis of adaptation and future threats are well understood, it may be appropriate to focus management on particular adaptive traits. For more typical conservations scenarios, we argue that screening genome-wide variation should be a sensible approach that may provide a generalized measure of evolutionary potential that accounts for the contributions of small-effect loci and cryptic variation and is robust to uncertainty about future change and required adaptive response(s). The best conservation outcomes should be achieved when genomic estimates of evolutionary potential are used within an adaptive management framework. PMID:25553064
The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data

PubMed Central

Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

2017-01-01

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data—previously only browseable through our FTP site—by focusing on particular samples, populations or data sets of interest. PMID:27638885
Jumping genes: Genomic ballast or powerhouse of biological diversification.

PubMed

Choudhury, Rimjhim Roy; Parisod, Christian

2017-09-01

Studying hybridization has the potential to elucidate challenging questions in evolutionary biology such as the nature of adaptive genetic variation and reproductive isolation. A growing body of work highlights that the merging of divergent genomes goes beyond the reshuffling of standing variation from related species and promotes mutations (Abbott et al., ). However, to what extent such genome instability generates evolutionary significant variation remains largely elusive. In this issue of Molecular Ecology, Dennenmoser et al. () report considerable dynamics of transposable elements (TEs) in a recent invasive fish species of hybrid origin (Cottus; Figure ). It adds to the recent examples from plants to support TE-specific genome variation following hybridization. Insights from early, as well as established, hybrids are largely coherent with increased TE activity, and this fish system thus represents an inspiring opportunity to further address the possible association between genome dynamics and "rapid evolution of hybrid species." This work based on genome (re)sequencing contrasts with prior transcriptomics or PCR-based studies of TEs and illustrates how unprecedented amount of information promises a better understanding of the multiple patterns of variation across eukaryotic genomes; provided that we get the better of methodological advances. As discussed here, unbiased assessment of TE variation from genome surveys indeed remains a challenge precluding firm conclusions to be reached about the evolutionary significance of TEs. Despite methodological and conceptual developments that appear necessary to unambiguously uncover the unexplored iceberg below the known tip, the role of coding genes vs. TEs in promoting adaptation and speciation might be clarified in a not so remote future. © 2017 John Wiley & Sons Ltd.
Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus cattle

USDA-ARS?s Scientific Manuscript database

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to gastrointestinal nematodes. In this study, we performed a lar...
Genomic regions showing copy number variations associate with resistance or susceptibility to gastrointestinal nematodes in Angus Cattle

USDA-ARS?s Scientific Manuscript database

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to intestinal nematodes. In this study, we performed a large sca...
Insights into structural variations and genome rearrangements in prokaryotic genomes.

PubMed

Periwal, Vinita; Scaria, Vinod

2015-01-01

Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genome-wide investigation of genetic changes during modern breeding of Brassica napus.

PubMed

Wang, Nian; Li, Feng; Chen, Biyun; Xu, Kun; Yan, Guixin; Qiao, Jiangwei; Li, Jun; Gao, Guizhen; Bancroft, Ian; Meng, Jingling; King, Graham J; Wu, Xiaoming

2014-08-01

Considerable genome variation had been incorporated within rapeseed breeding programs over past decades. In past decades, there have been substantial changes in phenotypic properties of rapeseed as a result of extensive breeding effort. Uncovering the underlying patterns of allelic variation in the context of genome organisation would provide knowledge to guide future genetic improvement. We assessed genome-wide genetic changes, including population structure, genetic relatedness, the extent of linkage disequilibrium, nucleotide diversity and genetic differentiation based on F ST outlier detection, for a panel of 472 Brassica napus inbred accessions using a 60 k Brassica Infinium® SNP array. We found genetic diversity varied in different sub-groups. Moreover, the genetic diversity increased from 1950 to 1980 and then remained at a similar level in China and Europe. We also found ~6-10 % genomic regions revealed high F ST values. Some QTLs previously associated with important agronomic traits overlapped with these regions. Overall, the B. napus C genome was found to have more high F ST signals than the A genome, and we concluded that the C genome may contribute more valuable alleles to generate elite traits. The results of this study indicate that considerable genome variation had been incorporated within rapeseed breeding programs over past decades. These results also contribute to understanding the impact of rapeseed improvement on available genome variation and the potential for dissecting complex agronomic traits.
Pan-Genome Analysis Links the Hereditary Variation of Leptospirillum ferriphilum With Its Evolutionary Adaptation

PubMed Central

Zhang, Xian; Liu, Xueduan; Yang, Fei; Chen, Lv

2018-01-01

Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome), indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs) (such as transposases, integrases, and phage-associated genes) revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation. PMID:29636744
Construction of a large collection of small genome variations in French dairy and beef breeds using whole-genome sequences.

PubMed

Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier

2016-11-15

In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.
VCGDB: a dynamic genome database of the Chinese population

PubMed Central

2014-01-01

Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222
A genomic perspective on the generation and maintenance of genetic diversity in herbivorous insects

PubMed Central

Gloss, Andrew D.; Groen, Simon C.; Whiteman, Noah K.

2017-01-01

Understanding the processes that generate and maintain genetic variation within populations is a central goal in evolutionary biology. Theory predicts that some of this variation is maintained as a consequence of adapting to variable habitats. Studies in herbivorous insects have played a key role in confirming this prediction. Here, we highlight theoretical and conceptual models for the maintenance of genetic diversity in herbivorous insects, empirical genomic studies testing these models, and pressing questions within the realm of evolutionary and functional genomic studies. To address key gaps, we propose an integrative approach combining population genomic scans for adaptation, genome-wide characterization of targets of selection through experimental manipulations, mapping the genetic architecture of traits influencing fitness, and functional studies. We also stress the importance of studying the maintenance of genetic variation across biological scales—from variation within populations to divergence among populations—to form a comprehensive view of adaptation in herbivorous insects. PMID:28736510
A high-resolution cattle CNV map by population-scale genome sequencing

USDA-ARS?s Scientific Manuscript database

Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. Prior studies in cattle have produced low-resolution CNV maps. We constructed a draft, high-resolution map of cattle CNVs based on whole genome sequencing data from 7...
Maize HapMap2 identifies extant variation from a genome in flux

USDA-ARS?s Scientific Manuscript database

The maize genome is the largest, most diverse and complex plant genome sequenced to date. Using high-throughput sequencing to access genetic variation and a population genetics model to score the polymorphisms, we characterize and unite the diversity of the world’s key breeding germplasm, wild rela...
Creation and genomic analysis of irradiation hybrids in Populus

Treesearch

Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover

2016-01-01

Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...

Extensive Copy Number Variation in Fermentation-Related Genes Among Saccharomyces cerevisiae Wine Strains.

PubMed

Steenwyk, Jacob; Rokas, Antonis

2017-05-05

Due to the importance of Saccharomyces cerevisiae in wine-making, the genomic variation of wine yeast strains has been extensively studied. One of the major insights stemming from these studies is that wine yeast strains harbor low levels of genetic diversity in the form of single nucleotide polymorphisms (SNPs). Genomic structural variants, such as copy number (CN) variants, are another major type of variation segregating in natural populations. To test whether genetic diversity in CN variation is also low across wine yeast strains, we examined genome-wide levels of CN variation in 132 whole-genome sequences of S. cerevisiae wine strains. We found an average of 97.8 CN variable regions (CNVRs) affecting ∼4% of the genome per strain. Using two different measures of CN diversity, we found that gene families involved in fermentation-related processes such as copper resistance ( CUP ), flocculation ( FLO ), and glucose metabolism ( HXT ), as well as the SNO gene family whose members are expressed before or during the diauxic shift, showed substantial CN diversity across the 132 strains examined. Importantly, these same gene families have been shown, through comparative transcriptomic and functional assays, to be associated with adaptation to the wine fermentation environment. Our results suggest that CN variation is a substantial contributor to the genomic diversity of wine yeast strains, and identify several candidate loci whose levels of CN variation may affect the adaptation and performance of wine yeast strains during fermentation. Copyright © 2017 Steenwyk and Rokas.
AGAPE (Automated Genome Analysis PipelinE) for Pan-Genome Analysis of Saccharomyces cerevisiae

PubMed Central

Song, Giltae; Dickins, Benjamin J. A.; Demeter, Janos; Engel, Stacia; Dunn, Barbara; Cherry, J. Michael

2015-01-01

The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community. PMID:25781462
Intra-isolate genome variation in arbuscular mycorrhizal fungi persists in the transcriptome.

PubMed

Boon, E; Zimmerman, E; Lang, B F; Hijri, M

2010-07-01

Arbuscular mycorrhizal fungi (AMF) are heterokaryotes with an unusual genetic makeup. Substantial genetic variation occurs among nuclei within a single mycelium or isolate. AMF reproduce through spores that contain varying fractions of this heterogeneous population of nuclei. It is not clear whether this genetic variation on the genome level actually contributes to the AMF phenotype. To investigate the extent to which polymorphisms in nuclear genes are transcribed, we analysed the intra-isolate genomic and cDNA sequence variation of two genes, the large subunit ribosomal RNA (LSU rDNA) of Glomus sp. DAOM-197198 (previously known as G. intraradices) and the POL1-like sequence (PLS) of Glomus etunicatum. For both genes, we find high sequence variation at the genome and transcriptome level. Reconstruction of LSU rDNA secondary structure shows that all variants are functional. Patterns of PLS sequence polymorphism indicate that there is one functional gene copy, PLS2, which is preferentially transcribed, and one gene copy, PLS1, which is a pseudogene. This is the first study that investigates AMF intra-isolate variation at the transcriptome level. In conclusion, it is possible that, in AMF, multiple nuclear genomes contribute to a single phenotype.
Intra-specific variation in genome size in maize: cytological and phenotypic correlates

PubMed Central

Realini, María Florencia; Poggio, Lidia; Cámara-Hernández, Julián; González, Graciela Esther

2016-01-01

Genome size variation accompanies the diversification and evolution of many plant species. Relationships between DNA amount and phenotypic and cytological characteristics form the basis of most hypotheses that ascribe a biological role to genome size. The goal of the present research was to investigate the intra-specific variation in the DNA content in maize populations from Northeastern Argentina and further explore the relationship between genome size and the phenotypic traits seed weight and length of the vegetative cycle. Moreover, cytological parameters such as the percentage of heterochromatin as well as the number, position and sequence composition of knobs were analysed and their relationships with 2C DNA values were explored. The populations analysed presented significant differences in 2C DNA amount, from 4.62 to 6.29 pg, representing 36.15 % of the inter-populational variation. Moreover, intra-populational genome size variation was found, varying from 1.08 to 1.63-fold. The variation in the percentage of knob heterochromatin as well as in the number, chromosome position and sequence composition of the knobs was detected among and within the populations. Although a positive relationship between genome size and the percentage of heterochromatin was observed, a significant correlation was not found. This confirms that other non-coding repetitive DNA sequences are contributing to the genome size variation. A positive relationship between DNA amount and the seed weight has been reported in a large number of species, this relationship was not found in the populations studied here. The length of the vegetative cycle showed a positive correlation with the percentage of heterochromatin. This result allowed attributing an adaptive effect to heterochromatin since the length of this cycle would be optimized via selection for an appropriate percentage of heterochromatin. PMID:26644343
Estimation and Partitioning of Heritability in Human Populations using Whole Genome Analysis Methods

PubMed Central

Vinkhuyzen, Anna AE; Wray, Naomi R; Yang, Jian; Goddard, Michael E; Visscher, Peter M

2014-01-01

Understanding genetic variation of complex traits in human populations has moved from the quantification of the resemblance between close relatives to the dissection of genetic variation into the contributions of individual genomic loci. But major questions remain unanswered: how much phenotypic variation is genetic, how much of the genetic variation is additive and what is the joint distribution of effect size and allele frequency at causal variants? We review and compare three whole-genome analysis methods that use mixed linear models (MLM) to estimate genetic variation, using the relationship between close or distant relatives based on pedigree or SNPs. We discuss theory, estimation procedures, bias and precision of each method and review recent advances in the dissection of additive genetic variation of complex traits in human populations that are based upon the application of MLM. Using genome wide data, SNPs account for far more of the genetic variation than the highly significant SNPs associated with a trait, but they do not account for all of the genetic variance estimated by pedigree based methods. We explain possible reasons for this ‘missing’ heritability. PMID:23988118
The international Genome sample resource (IGSR): A worldwide collection of genome variation incorporating the 1000 Genomes Project data.

PubMed

Clarke, Laura; Fairley, Susan; Zheng-Bradley, Xiangqun; Streeter, Ian; Perry, Emily; Lowy, Ernesto; Tassé, Anne-Marie; Flicek, Paul

2017-01-04

The International Genome Sample Resource (IGSR; http://www.internationalgenome.org) expands in data type and population diversity the resources from the 1000 Genomes Project. IGSR represents the largest open collection of human variation data and provides easy access to these resources. IGSR was established in 2015 to maintain and extend the 1000 Genomes Project data, which has been widely used as a reference set of human variation and by researchers developing analysis methods. IGSR has mapped all of the 1000 Genomes sequence to the newest human reference (GRCh38), and will release updated variant calls to ensure maximal usefulness of the existing data. IGSR is collecting new structural variation data on the 1000 Genomes samples from long read sequencing and other technologies, and will collect relevant functional data into a single comprehensive resource. IGSR is extending coverage with new populations sequenced by collaborating groups. Here, we present the new data and analysis that IGSR has made available. We have also introduced a new data portal that increases discoverability of our data-previously only browseable through our FTP site-by focusing on particular samples, populations or data sets of interest. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomic Analysis of Hepatitis B Virus Reveals Antigen State and Genotype as Sources of Evolutionary Rate Variation

PubMed Central

Harrison, Abby; Lemey, Philippe; Hurles, Matthew; Moyes, Chris; Horn, Susanne; Pryor, Jan; Malani, Joji; Supuri, Mathias; Masta, Andrew; Teriboriki, Burentau; Toatu, Tebuka; Penny, David; Rambaut, Andrew; Shapiro, Beth

2011-01-01

Hepatitis B virus (HBV) genomes are small, semi-double-stranded DNA circular genomes that contain alternating overlapping reading frames and replicate through an RNA intermediary phase. This complex biology has presented a challenge to estimating an evolutionary rate for HBV, leading to difficulties resolving the evolutionary and epidemiological history of the virus. Here, we re-examine rates of HBV evolution using a novel data set of 112 within-host, transmission history (pedigree) and among-host genomes isolated over 20 years from the indigenous peoples of the South Pacific, combined with 313 previously published HBV genomes. We employ Bayesian phylogenetic approaches to examine several potential causes and consequences of evolutionary rate variation in HBV. Our results reveal rate variation both between genotypes and across the genome, as well as strikingly slower rates when genomes are sampled in the Hepatitis B e antigen positive state, compared to the e antigen negative state. This Hepatitis B e antigen rate variation was found to be largely attributable to changes during the course of infection in the preCore and Core genes and their regulatory elements. PMID:21765983
A high-resolution cattle CNV map by population-scale genome sequencing

USDA-ARS?s Scientific Manuscript database

Copy Number Variations (CNVs) are common genomic structural variations that have been linked to human diseases and phenotypic traits. CNVs represent an important type of genetic variation among cattle breeds and even individual animals; however, only low-resolution maps of cattle CNVs currently exis...
New Regions of the Human Genome Linked to Skin Color Variation in Some African Populations

Cancer.gov

In the first study of its kind, an international team of genomics researchers has identified new regions of the human genome that are associated with skin color variation in some African populations, opening new avenues for research on skin diseases and cancer in all populations.
The African Genome Variation Project shapes medical genetics in Africa

NASA Astrophysics Data System (ADS)

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2015-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.

PubMed

Teo, Yik-Ying; Sim, Xueling; Ong, Rick T H; Tan, Adrian K S; Chen, Jieming; Tantoso, Erwin; Small, Kerrin S; Ku, Chee-Seng; Lee, Edmund J D; Seielstad, Mark; Chia, Kee-Seng

2009-11-01

The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser.
The African Genome Variation Project shapes medical genetics in Africa.

PubMed

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

2015-01-15

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Singapore Genome Variation Project: A haplotype map of three Southeast Asian populations

PubMed Central

Teo, Yik-Ying; Sim, Xueling; Ong, Rick T.H.; Tan, Adrian K.S.; Chen, Jieming; Tantoso, Erwin; Small, Kerrin S.; Ku, Chee-Seng; Lee, Edmund J.D.; Seielstad, Mark; Chia, Kee-Seng

2009-01-01

The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser. PMID:19700652
Genome Variation Within Triticale in Comparison to its Wheat and Rye Progenitors

USDA-ARS?s Scientific Manuscript database

Genome variation in the intergeneric wheat-rye hybrid triticale (X Triticosecale Wittmack) has been a puzzle to scientists and plant breeders since the first triticale was synthesized. The existence of unexplained genetic variation in triticale as compared to the parents has been a hindrance to bre...
Sex-linked genomic variation and its relationship to avian plumage dichromatism and sexual selection.

PubMed

Huang, Huateng; Rabosky, Daniel L

2015-09-16

Sexual dichromatism is the tendency for sexes to differ in color pattern and represents a striking form of within-species morphological variation. Conspicuous intersexual differences in avian plumage are generally thought to result from Darwinian sexual selection, to the extent that dichromatism is often treated as a surrogate for the intensity of sexual selection in phylogenetic comparative studies. Intense sexual selection is predicted to leave a footprint on genetic evolution by reducing the relative genetic diversity on sex chromosome to that on the autosomes. In this study, we test the association between plumage dichromatism and sex-linked genetic diversity using eight species pairs with contrasting levels of dichromatism. We estimated Z-linked and autosomal genetic diversity for these non-model avian species using restriction-site associated (RAD) loci that covered ~3 % of the genome. We find that monochromatic birds consistently have reduced sex-linked genomic variation relative to phylogenetically-paired dichromatic species and this pattern is robust to mutational biases. Our results are consistent with several interpretations. If present-day sexual selection is stronger in dichromatic birds, our results suggest that its impact on sex-linked genomic variation is offset by other processes that lead to proportionately lower Z-linked variation in monochromatic species. We discuss possible factors that may contribute to this discrepancy between phenotypes and genomic variation. Conversely, it is possible that present-day sexual selection -- as measured by the variance in male reproductive success -- is stronger in the set of monochromatic taxa we have examined, potentially reflecting the importance of song, behavior and other non-plumage associated traits as targets of sexual selection. This counterintuitive finding suggests that the relationship between genomic variation and sexual selection is complex and highlights the need for a more comprehensive survey of genomic variation in avian taxa that vary markedly in social and genetic mating systems.
ENGINES: exploring single nucleotide variation in entire human genomes.

PubMed

Amigo, Jorge; Salas, Antonio; Phillips, Christopher

2011-04-19

Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php. © 2011 Amigo et al; licensee BioMed Central Ltd.
A comprehensive profile of DNA copy number variations in a Korean population: identification of copy number invariant regions among Koreans.

PubMed

Jeon, Jae Pil; Shim, Sung Mi; Jung, Jong Sun; Nam, Hye Young; Lee, Hye Jin; Oh, Berm Seok; Kim, Kuchan; Kim, Hyung Lae; Han, Bok Ghee

2009-09-30

To examine copy number variations among the Korean population, we compared individual genomes with the Korean reference genome assembly using the publicly available Korean HapMap SNP 50 k chip data from 90 individuals. Korean individuals exhibited 123 copy number variation regions (CNVRs) covering 27.2 mb, equivalent to 1.0% of the genome in the copy number variation (CNV) analysis using the combined criteria of P value (P<0.01) and standard deviation of copy numbers (SD>or= 0.25) among study subjects. In contrast, when compared to the Affymetrix reference genome assembly from multiple ethnic groups, considerably more CNVRs (n=643) were detected in larger proportions (5.0%) of the genome covering 135.1 mb even by more stringent criteria (P<0.001 and SD>or=0.25), reflecting ethnic diversity of structural variations between Korean and other populations. Some CNVRs were validated by the quantitative multiplex PCR of short fluorescent fragment (QMPSF) method, and then copy number invariant regions were detected among the study subjects. These copy number invariant regions would be used as good internal controls for further CNV studies. Lastly, we demonstrated that the CNV information could stratify even a single ethnic population with a proper reference genome assembly from multiple heterogeneous populations.
Saving the spandrels? Adaptive genomic variation in conservation and fisheries management.

PubMed

Pearse, D E

2016-12-01

As highlighted by many of the papers in this issue, research on the genomic basis of adaptive phenotypic variation in natural populations has made spectacular progress in the past few years, largely due to the advances in sequencing technology and analysis. Without question, the resulting genomic data will improve the understanding of regions of the genome under selection and extend knowledge of the genetic basis of adaptive evolution. What is far less clear, but has been the focus of active discussion, is how such information can or should transfer into conservation practice to complement more typical conservation applications of genetic data. Before such applications can be realized, the evolutionary importance of specific targets of selection relative to the genome-wide diversity of the species as a whole must be evaluated. The key issues for the incorporation of adaptive genomic variation in conservation and management are discussed here, using published examples of adaptive genomic variation associated with specific phenotypes in salmonids and other taxa to highlight practical considerations for incorporating such information into conservation programmes. Scenarios are described in which adaptive genomic data could be used in conservation or restoration, constraints on its utility and the importance of validating inferences drawn from new genomic data before applying them in conservation practice. Finally, it is argued that an excessive focus on preserving the adaptive variation that can be measured, while ignoring the vast unknown majority that cannot, is a modern twist on the adaptationist programme that Gould and Lewontin critiqued almost 40 years ago. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Segmental Duplications and Copy-Number Variation in the Human Genome

PubMed Central

Sharp, Andrew J. ; Locke, Devin P. ; McGrath, Sean D. ; Cheng, Ze ; Bailey, Jeffrey A. ; Vallente, Rhea U. ; Pertz, Lisa M. ; Clark, Royden A. ; Schwartz, Stuart ; Segraves, Rick ; Oseroff, Vanessa V. ; Albertson, Donna G. ; Pinkel, Daniel ; Eichler, Evan E.

2005-01-01

The human genome contains numerous blocks of highly homologous duplicated sequence. This higher-order architecture provides a substrate for recombination and recurrent chromosomal rearrangement associated with genomic disease. However, an assessment of the role of segmental duplications in normal variation has not yet been made. On the basis of the duplication architecture of the human genome, we defined a set of 130 potential rearrangement hotspots and constructed a targeted bacterial artificial chromosome (BAC) microarray (with 2,194 BACs) to assess copy-number variation in these regions by array comparative genomic hybridization. Using our segmental duplication BAC microarray, we screened a panel of 47 normal individuals, who represented populations from four continents, and we identified 119 regions of copy-number polymorphism (CNP), 73 of which were previously unreported. We observed an equal frequency of duplications and deletions, as well as a 4-fold enrichment of CNPs within hotspot regions, compared with control BACs (P < .000001), which suggests that segmental duplications are a major catalyst of large-scale variation in the human genome. Importantly, segmental duplications themselves were also significantly enriched >4-fold within regions of CNP. Almost without exception, CNPs were not confined to a single population, suggesting that these either are recurrent events, having occurred independently in multiple founders, or were present in early human populations. Our study demonstrates that segmental duplications define hotspots of chromosomal rearrangement, likely acting as mediators of normal variation as well as genomic disease, and it suggests that the consideration of genomic architecture can significantly improve the ascertainment of large-scale rearrangements. Our specialized segmental duplication BAC microarray and associated database of structural polymorphisms will provide an important resource for the future characterization of human genomic disorders. PMID:15918152
Insights From Genomics Into Spatial and Temporal Variation in Batrachochytrium dendrobatidis.

PubMed

Byrne, A Q; Voyles, J; Rios-Sotelo, G; Rosenblum, E B

2016-01-01

Advances in genetics and genomics have provided new tools for the study of emerging infectious diseases. Researchers can now move quickly from simple hypotheses to complex explanations for pathogen origin, spread, and mechanisms of virulence. Here we focus on the application of genomics to understanding the biology of the fungal pathogen Batrachochytrium dendrobatidis (Bd), a novel and deadly pathogen of amphibians. We provide a brief history of the system, then focus on key insights into Bd variation garnered from genomics approaches, and finally, highlight new frontiers for future discoveries. Genomic tools have revealed unexpected complexity and variation in the Bd system suggesting that the history and biology of emerging pathogens may not be as simple as they initially seem. Copyright © 2016 Elsevier Inc. All rights reserved.

Genomic Variation, Host Range, and Infection Kinetics of Closely Related Cyanopodoviruses from New England Coastal Waters

NASA Astrophysics Data System (ADS)

Veglia, A. J.; Milford, C. R.; Marston, M.

2016-02-01

Viruses infecting marine Synechococcus are abundant in coastal marine environments and influence the community composition and abundance of their cyanobacterial hosts. In this study, we focused on the cyanopodoviruses which have smaller genomes and narrower host ranges relative to cyanomyoviruses. While previous studies have compared the genomes of diverse podoviruses, here we analyzed the genomic variation, host ranges, and infection kinetics of podoviruses within the same OTU. The genomes of fifty-five podoviral isolates from the coastal waters of New England were fully sequenced. Based on DNA polymerase gene sequences, these isolates fall into five discrete OTUs (termed RIP - Rhode Island Podovirus). Although all the isolates belonging to the same RIP have very similar DNA polymerase gene sequences (>98% sequence identity), differences in genome content, particularly in regions associated with tail fiber genes, were observed among isolates in the same RIP. Host range tests reveal variation both across and within RIPs. Notably within RIP1, isolates that had similar tail fiber regions also had similar host ranges. Isolates belonging to RIP4 do not contain the host-derived psbA photosynthesis gene, while isolates in the other four RIPs do possess a psbA gene. Nevertheless, infection kinetic experiments suggest that the latent period and burst size for RIP4 isolates are similar to RIP1 isolates. We are continuing to investigate the correlations among genome content, host range, and infection kinetics of isolates belonging to the same OTU. Our results to date suggest that there is substantial genomic variation within an OTU and that this variation likely influences cyanopodoviral - host interactions.
Molecular genetic characterization of the RD-114 gene family of endogenous feline retroviral sequences.

PubMed Central

Reeves, R H; O'Brien, S J

1984-01-01

RD-114 is a replication-competent, xenotropic retrovirus which is homologous to a family of moderately repetitive DNA sequences present at ca. 20 copies in the normal cellular genome of domestic cats. To examine the extent and character of genomic divergence of the RD-114 gene family as well as to assess their positional association within the cat genome, we have prepared a series of molecular clones of endogenous RD-114 DNA segments from a genomic library of cat cellular DNA. Their restriction endonuclease maps were compared with each other as well as to that of the prototype-inducible RD-114 which was molecularly cloned from a chronically infected human cell line. The endogenous sequences analyzed were similar to each other in that they were colinear with RD-114 proviral DNA, were bounded by long terminal redundancies, and conserved many restriction sites in the gag and pol regions. However, the env regions of many of the sequences examined were substantially deleted. Several of the endogenous RD-114 genomes contained a novel envelope sequence which was unrelated to the env gene of the prototype RD-114 env gene but which, like RD-114 and endogenous feline leukemia virus provirus, was found only in species of the genus Felis, and not in other closely related Felidae genera. The endogenous RD-114 sequences each had a distinct cellular flank which indicates that these sequences are not tandem but dispersed nonspecifically throughout the genome. Southern analysis of cat cellular DNA confirmed the conclusions about conserved restriction sites in endogenous sequences and indicated that a single locus may be responsible for the production of the major inducible form of RD-114. Images PMID:6090693
Genomic Characterization of a Newly Discovered Coronavirus Associated with Acute Respiratory Distress Syndrome in Humans

PubMed Central

van Boheemen, Sander; de Graaf, Miranda; Lauber, Chris; Bestebroer, Theo M.; Raj, V. Stalin; Zaki, Ali Moh; Osterhaus, Albert D. M. E.; Haagmans, Bart L.; Gorbalenya, Alexander E.; Snijder, Eric J.; Fouchier, Ron A. M.

2012-01-01

ABSTRACT A novel human coronavirus (HCoV-EMC/2012) was isolated from a man with acute pneumonia and renal failure in June 2012. This report describes the complete genome sequence, genome organization, and expression strategy of HCoV-EMC/2012 and its relation with known coronaviruses. The genome contains 30,119 nucleotides and contains at least 10 predicted open reading frames, 9 of which are predicted to be expressed from a nested set of seven subgenomic mRNAs. Phylogenetic analysis of the replicase gene of coronaviruses with completely sequenced genomes showed that HCoV-EMC/2012 is most closely related to Tylonycteris bat coronavirus HKU4 (BtCoV-HKU4) and Pipistrellus bat coronavirus HKU5 (BtCoV-HKU5), which prototype two species in lineage C of the genus Betacoronavirus. In accordance with the guidelines of the International Committee on Taxonomy of Viruses, and in view of the 75% and 77% amino acid sequence identity in 7 conserved replicase domains with BtCoV-HKU4 and BtCoV-HKU5, respectively, we propose that HCoV-EMC/2012 prototypes a novel species in the genus Betacoronavirus. HCoV-EMC/2012 may be most closely related to a coronavirus detected in Pipistrellus pipistrellus in The Netherlands, but because only a short sequence from the most conserved part of the RNA-dependent RNA polymerase-encoding region of the genome was reported for this bat virus, its genetic distance from HCoV-EMC remains uncertain. HCoV-EMC/2012 is the sixth coronavirus known to infect humans and the first human virus within betacoronavirus lineage C. PMID:23170002
Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

USDA-ARS?s Scientific Manuscript database

Copy number variations (CNVs) are large insertions, deletions or duplications in the genome that vary between members of a species and are known to affect a wide variety of phenotypic traits. In this study, we identified CNVs in a population of bulls using low coverage next-generation sequence data....
A global reference for human genetic variation

PubMed Central

2016-01-01

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
Maternal and paternal genomes differentially affect myofibre characteristics and muscle weights of bovine fetuses at midgestation.

PubMed

Xiang, Ruidong; Ghanipoor-Samami, Mani; Johns, William H; Eindorf, Tanja; Rutley, David L; Kruk, Zbigniew A; Fitzsimmons, Carolyn J; Thomsen, Dana A; Roberts, Claire T; Burns, Brian M; Anderson, Gail I; Greenwood, Paul L; Hiendleder, Stefan

2013-01-01

Postnatal myofibre characteristics and muscle mass are largely determined during fetal development and may be significantly affected by epigenetic parent-of-origin effects. However, data on such effects in prenatal muscle development that could help understand unexplained variation in postnatal muscle traits are lacking. In a bovine model we studied effects of distinct maternal and paternal genomes, fetal sex, and non-genetic maternal effects on fetal myofibre characteristics and muscle mass. Data from 73 fetuses (Day153, 54% term) of four genetic groups with purebred and reciprocal cross Angus and Brahman genetics were analyzed using general linear models. Parental genomes explained the greatest proportion of variation in myofibre size of Musculus semitendinosus (80-96%) and in absolute and relative weights of M. supraspinatus, M. longissimus dorsi, M. quadriceps femoris and M. semimembranosus (82-89% and 56-93%, respectively). Paternal genome in interaction with maternal genome (P<0.05) explained most genetic variation in cross sectional area (CSA) of fast myotubes (68%), while maternal genome alone explained most genetic variation in CSA of fast myofibres (93%, P<0.01). Furthermore, maternal genome independently (M. semimembranosus, 88%, P<0.0001) or in combination (M. supraspinatus, 82%; M. longissimus dorsi, 93%; M. quadriceps femoris, 86%) with nested maternal weight effect (5-6%, P<0.05), was the predominant source of variation for absolute muscle weights. Effects of paternal genome on muscle mass decreased from thoracic to pelvic limb and accounted for all (M. supraspinatus, 97%, P<0.0001) or most (M. longissimus dorsi, 69%, P<0.0001; M. quadriceps femoris, 54%, P<0.001) genetic variation in relative weights. An interaction between maternal and paternal genomes (P<0.01) and effects of maternal weight (P<0.05) on expression of H19, a master regulator of an imprinted gene network, and negative correlations between H19 expression and fetal muscle mass (P<0.001), suggested imprinted genes and miRNA interference as mechanisms for differential effects of maternal and paternal genomes on fetal muscle.
Maternal and Paternal Genomes Differentially Affect Myofibre Characteristics and Muscle Weights of Bovine Fetuses at Midgestation

PubMed Central

Xiang, Ruidong; Ghanipoor-Samami, Mani; Johns, William H.; Eindorf, Tanja; Rutley, David L.; Kruk, Zbigniew A.; Fitzsimmons, Carolyn J.; Thomsen, Dana A.; Roberts, Claire T.; Burns, Brian M.; Anderson, Gail I.; Greenwood, Paul L.; Hiendleder, Stefan

2013-01-01

Postnatal myofibre characteristics and muscle mass are largely determined during fetal development and may be significantly affected by epigenetic parent-of-origin effects. However, data on such effects in prenatal muscle development that could help understand unexplained variation in postnatal muscle traits are lacking. In a bovine model we studied effects of distinct maternal and paternal genomes, fetal sex, and non-genetic maternal effects on fetal myofibre characteristics and muscle mass. Data from 73 fetuses (Day153, 54% term) of four genetic groups with purebred and reciprocal cross Angus and Brahman genetics were analyzed using general linear models. Parental genomes explained the greatest proportion of variation in myofibre size of Musculus semitendinosus (80–96%) and in absolute and relative weights of M. supraspinatus, M. longissimus dorsi, M. quadriceps femoris and M. semimembranosus (82–89% and 56–93%, respectively). Paternal genome in interaction with maternal genome (P<0.05) explained most genetic variation in cross sectional area (CSA) of fast myotubes (68%), while maternal genome alone explained most genetic variation in CSA of fast myofibres (93%, P<0.01). Furthermore, maternal genome independently (M. semimembranosus, 88%, P<0.0001) or in combination (M. supraspinatus, 82%; M. longissimus dorsi, 93%; M. quadriceps femoris, 86%) with nested maternal weight effect (5–6%, P<0.05), was the predominant source of variation for absolute muscle weights. Effects of paternal genome on muscle mass decreased from thoracic to pelvic limb and accounted for all (M. supraspinatus, 97%, P<0.0001) or most (M. longissimus dorsi, 69%, P<0.0001; M. quadriceps femoris, 54%, P<0.001) genetic variation in relative weights. An interaction between maternal and paternal genomes (P<0.01) and effects of maternal weight (P<0.05) on expression of H19, a master regulator of an imprinted gene network, and negative correlations between H19 expression and fetal muscle mass (P<0.001), suggested imprinted genes and miRNA interference as mechanisms for differential effects of maternal and paternal genomes on fetal muscle. PMID:23341941
Whole-genome analyses of Korean native and Holstein cattle breeds by massively parallel sequencing.

PubMed

Choi, Jung-Woo; Liao, Xiaoping; Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

2014-01-01

A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea--Hanwoo, Jeju Heugu, and Korean Holstein--using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions-deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding.
Whole-Genome Analyses of Korean Native and Holstein Cattle Breeds by Massively Parallel Sequencing

PubMed Central

Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P.; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

2014-01-01

A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea—Hanwoo, Jeju Heugu, and Korean Holstein—using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions–deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding. PMID:24992012
Molecular spectrum of somaclonal variation in regenerated rice revealed by whole-genome sequencing.

PubMed

Miyao, Akio; Nakagome, Mariko; Ohnuma, Takako; Yamagata, Harumi; Kanamori, Hiroyuki; Katayose, Yuichi; Takahashi, Akira; Matsumoto, Takashi; Hirochika, Hirohiko

2012-01-01

Somaclonal variation is a phenomenon that results in the phenotypic variation of plants regenerated from cell culture. One of the causes of somaclonal variation in rice is the transposition of retrotransposons. However, many aspects of the mechanisms that result in somaclonal variation remain undefined. To detect genome-wide changes in regenerated rice, we analyzed the whole-genome sequences of three plants independently regenerated from cultured cells originating from a single seed stock. Many single-nucleotide polymorphisms (SNPs) and insertions and deletions (indels) were detected in the genomes of the regenerated plants. The transposition of only Tos17 among 43 transposons examined was detected in the regenerated plants. Therefore, the SNPs and indels contribute to the somaclonal variation in regenerated rice in addition to the transposition of Tos17. The observed molecular spectrum was similar to that of the spontaneous mutations in Arabidopsis thaliana. However, the base change ratio was estimated to be 1.74 × 10(-6) base substitutions per site per regeneration, which is 248-fold greater than the spontaneous mutation rate of A. thaliana.
GFVO: the Genomic Feature and Variation Ontology.

PubMed

Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen; Antezana, Erick; Hoehndorf, Robert; Dumontier, Michel

2015-01-01

Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology's GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.
Genome size variation in deep-sea amphipods

PubMed Central

Jamieson, A. J.; Piertney, S. B.

2017-01-01

Genome size varies considerably across taxa, and extensive research effort has gone into understanding whether variation can be explained by differences in key ecological and life-history traits among species. The extreme environmental conditions that characterize the deep sea have been hypothesized to promote large genome sizes in eukaryotes. Here we test this supposition by examining genome sizes among 13 species of deep-sea amphipods from the Mariana, Kermadec and New Hebrides trenches. Genome sizes were estimated using flow cytometry and found to vary nine-fold, ranging from 4.06 pg (4.04 Gb) in Paralicella caperesca to 34.79 pg (34.02 Gb) in Alicella gigantea. Phylogenetic independent contrast analysis identified a relationship between genome size and maximum body size, though this was largely driven by those species that display size gigantism. There was a distinct shift in the genome size trait diversification rate in the supergiant amphipod A. gigantea relative to the rest of the group. The variation in genome size observed is striking and argues against genome size being driven by a common evolutionary history, ecological niche and life-history strategy in deep-sea amphipods. PMID:28989783
Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation.

PubMed

Kidd, Jeffrey M; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F; Peckham, Heather E; Omberg, Larsson; Bormann Chung, Christina A; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G; Russell, Archie; Reynolds, Andy; Clark, Andrew G; Reese, Martin G; Lincoln, Stephen E; Butte, Atul J; De La Vega, Francisco M; Bustamante, Carlos D

2012-10-05

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago. Copyright © 2012 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Population Genetic Inference from Personal Genome Data: Impact of Ancestry and Admixture on Human Genomic Variation

PubMed Central

Kidd, Jeffrey M.; Gravel, Simon; Byrnes, Jake; Moreno-Estrada, Andres; Musharoff, Shaila; Bryc, Katarzyna; Degenhardt, Jeremiah D.; Brisbin, Abra; Sheth, Vrunda; Chen, Rong; McLaughlin, Stephen F.; Peckham, Heather E.; Omberg, Larsson; Bormann Chung, Christina A.; Stanley, Sarah; Pearlstein, Kevin; Levandowsky, Elizabeth; Acevedo-Acevedo, Suehelay; Auton, Adam; Keinan, Alon; Acuña-Alonzo, Victor; Barquera-Lozano, Rodrigo; Canizales-Quinteros, Samuel; Eng, Celeste; Burchard, Esteban G.; Russell, Archie; Reynolds, Andy; Clark, Andrew G.; Reese, Martin G.; Lincoln, Stephen E.; Butte, Atul J.; De La Vega, Francisco M.; Bustamante, Carlos D.

2012-01-01

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas—70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7–8 generations ago. PMID:23040495
α satellite DNA variation and function of the human centromere

PubMed Central

Sullivan, Lori L.; Chew, Kimberline

2017-01-01

ABSTRACT Genomic variation is a source of functional diversity that is typically studied in genic and non-coding regulatory regions. However, the extent of variation within noncoding portions of the human genome, particularly highly repetitive regions, and the functional consequences are not well understood. Satellite DNA, including α satellite DNA found at human centromeres, comprises up to 10% of the genome, but is difficult to study because its repetitive nature hinders contiguous sequence assemblies. We recently described variation within α satellite DNA that affects centromere function. On human chromosome 17 (HSA17), we showed that size and sequence polymorphisms within primary array D17Z1 are associated with chromosome aneuploidy and defective centromere architecture. However, HSA17 can counteract this instability by assembling the centromere at a second, “backup” array lacking variation. Here, we discuss our findings in a broader context of human centromere assembly, and highlight areas of future study to uncover links between genomic and epigenetic features of human centromeres. PMID:28406740
Design of the AFGL Prototype Long Baseline Tiltmeter.

DTIC Science & Technology

1985-08-29

331 row Design of the AFGL Prototype Long Baseline Tiltmeter Ik KENNETH 0. POHLIG, ILt, USAF SCHARINE KIRCHOFF ~ 29 August 1985 ilL Approved for... Tiltmeter 9ATRT,rI,f,’, Kent 0. ,se." 11A de’. I, , andIl ,,,,’.cho, Shr seso eliltrCoic s andEC ineEae and -1- res th oue cssema isd inlded. Reslt ofoc...aubr twoda ts arer described. theeina prototyp elnba lie tiltmetershows ah hdpedneupns temper~ature variation. Trhis must be eliminated if the tiltmeter
A Rapidly Prototyped Vegetation Dryness Index Evaluated for Wildfire Risk Assessment at Stennis Space Center

NASA Technical Reports Server (NTRS)

Ross, Kenton; Graham, William; Prados, Don; Spruce, Joseph

2007-01-01

MVDI, which effectively involves the differencing of NDMI and NDVI, appears to display increased noise that is consistent with a differencing technique. This effect masks finer variations in vegetation moisture, preventing MVDI from fulfilling the requirement of giving decision makers insight into spatial variation of fire risk. MVDI shows dependencies on land cover and phenology which also argue against its use as a fire risk proxy in an area of diverse and fragmented land covers. The conclusion of the rapid prototyping effort is that MVDI should not be implemented for SSC decision support.
Implicit face prototype learning from geometric information.

PubMed

Or, Charles C-F; Wilson, Hugh R

2013-04-19

There is evidence that humans implicitly learn an average or prototype of previously studied faces, as the unseen face prototype is falsely recognized as having been learned (Solso & McCarthy, 1981). Here we investigated the extent and nature of face prototype formation where observers' memory was tested after they studied synthetic faces defined purely in geometric terms in a multidimensional face space. We found a strong prototype effect: The basic results showed that the unseen prototype averaged from the studied faces was falsely identified as learned at a rate of 86.3%, whereas individual studied faces were identified correctly 66.3% of the time and the distractors were incorrectly identified as having been learned only 32.4% of the time. This prototype learning lasted at least 1 week. Face prototype learning occurred even when the studied faces were further from the unseen prototype than the median variation in the population. Prototype memory formation was evident in addition to memory formation of studied face exemplars as demonstrated in our models. Additional studies showed that the prototype effect can be generalized across viewpoints, and head shape and internal features separately contribute to prototype formation. Thus, implicit face prototype extraction in a multidimensional space is a very general aspect of geometric face learning. Copyright © 2013 Elsevier Ltd. All rights reserved.
An argument for mechanism-based statistical inference in cancer

PubMed Central

Ochs, Michael; Price, Nathan D.; Tomasetti, Cristian; Younes, Laurent

2015-01-01

Cancer is perhaps the prototypical systems disease, and as such has been the focus of extensive study in quantitative systems biology. However, translating these programs into personalized clinical care remains elusive and incomplete. In this perspective, we argue that realizing this agenda—in particular, predicting disease phenotypes, progression and treatment response for individuals—requires going well beyond standard computational and bioinformatics tools and algorithms. It entails designing global mathematical models over network-scale configurations of genomic states and molecular concentrations, and learning the model parameters from limited available samples of high-dimensional and integrative omics data. As such, any plausible design should accommodate: biological mechanism, necessary for both feasible learning and interpretable decision making; stochasticity, to deal with uncertainty and observed variation at many scales; and a capacity for statistical inference at the patient level. This program, which requires a close, sustained collaboration between mathematicians and biologists, is illustrated in several contexts, including learning bio-markers, metabolism, cell signaling, network inference and tumorigenesis. PMID:25381197
Targeted mutation analysis of endometrial clear cell carcinoma.

PubMed

Hoang, Lien N; McConechy, Melissa K; Meng, Bo; McIntyre, John B; Ewanowich, Carol; Gilks, Cyril Blake; Huntsman, David G; Köbel, Martin; Lee, Cheng-Han

2015-04-01

Endometrial clear cell carcinomas (CCC) constitute fewer than 5% of all carcinomas of the endometrium. Currently, little is known regarding the genetic basis of endometrial CCC. We performed genomic and immunohistochemical analyses on 14 rigorously reviewed pure endometrial CCC. The genomic analysis consisted of sequencing the coding regions of 26 genes implicated previously in endometrial carcinoma. Twelve of 14 tumours displayed a prototypical CCC immunophenotype [napsin A+, hepatocyte nuclear factor-1β (HNF1β(+) ) and oestrogen receptor(-) ] and all showed intact mismatch repair protein expression. We detected mutations in 11 of 14 tumours, and there was a predominance of mutations involving genes that are mutated more frequently in endometrial serous carcinomas than in endometrioid carcinomas. Two tumours displayed a prototypical serous carcinoma mutation profile (concurrent TP53 and PPP2R1A mutations, without PTEN, CTNNB1 or ARID1A mutation). No mutations in PTEN, CTNNB1 or POLE were identified. The overall mutation profile of this cohort of endometrial CCC appears to be more serous-like than endometrioid-like, with a minor subset in the TP53-mutated CCC showing serous carcinoma profile. These findings provide new insights into the molecular features of morphologically prototypical endometrial CCC, and underscore the need for further investigations into the oncogenesis of endometrial CCC. © 2014 John Wiley & Sons Ltd.

Beyond genomic variation--comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage.

PubMed

Lin, Ke; Zhang, Ningwen; Severing, Edouard I; Nijveen, Harm; Cheng, Feng; Visser, Richard G F; Wang, Xiaowu; de Ridder, Dick; Bonnema, Guusje

2014-03-31

Brassica rapa is an economically important crop species. During its long breeding history, a large number of morphotypes have been generated, including leafy vegetables such as Chinese cabbage and pakchoi, turnip tuber crops and oil crops. To investigate the genetic variation underlying this morphological variation, we re-sequenced, assembled and annotated the genomes of two B. rapa subspecies, turnip crops (turnip) and a rapid cycling. We then analysed the two resulting genomes together with the Chinese cabbage Chiifu reference genome to obtain an impression of the B. rapa pan-genome. The number of genes with protein-coding changes between the three genotypes was lower than that among different accessions of Arabidopsis thaliana, which can be explained by the smaller effective population size of B. rapa due to its domestication. Based on orthology to a number of non-brassica species, we estimated the date of divergence among the three B. rapa morphotypes at approximately 250,000 YA, far predating Brassica domestication (5,000-10,000 YA). By analysing genes unique to turnip we found evidence for copy number differences in peroxidases, pointing to a role for the phenylpropanoid biosynthesis pathway in the generation of morphological variation. The estimated date of divergence among three B. rapa morphotypes implies that prior to domestication there was already considerably divergence among B. rapa genotypes. Our study thus provides two new B. rapa reference genomes, delivers a set of computer tools to analyse the resulting pan-genome and uses these to shed light on genetic drivers behind the rich morphological variation found in B. rapa.
Individual epigenetic variation: When, why, and so what?

USDA-ARS?s Scientific Manuscript database

Epigenetics provides a potential explanation for how environmental factors modify the risk for common diseases among individuals. Interindividual variation in DNA methylation and epigenetic regulation has been reported at specific genomic regions including transposable elements, genomically imprinte...
Genome typing of nonhuman primate models: implications for biomedical research.

PubMed

Haus, Tanja; Ferguson, Betsy; Rogers, Jeffrey; Doxiadis, Gaby; Certa, Ulrich; Rose, Nicola J; Teepe, Robert; Weinbauer, Gerhard F; Roos, Christian

2014-11-01

The success of personalized medicine rests on understanding the genetic variation between individuals. Thus, as medical practice evolves and variation among individuals becomes a fundamental aspect of clinical medicine, a thorough consideration of the genetic and genomic information concerning the animals used as models in biomedical research also becomes critical. In particular, nonhuman primates (NHPs) offer great promise as models for many aspects of human health and disease. These are outbred species exhibiting substantial levels of genetic variation; however, understanding of the contribution of this variation to phenotypes is lagging behind in NHP species. Thus, there is a pivotal need to address this gap and define strategies for characterizing both genomic content and variability within primate models of human disease. Here, we discuss the current state of genomics of NHP models and offer guidelines for future work to ensure continued improvement and utility of this line of biomedical research. Copyright © 2014 Elsevier Ltd. All rights reserved.
Genome-wide identification of copy number variations between two chicken lines that differ in genetic resistance to Marek’s disease

USDA-ARS?s Scientific Manuscript database

Background: Copy number variation (CNV) is a major source of genome polymorphism that directly contributes to phenotypic variation such as resistance to infectious diseases. Lines 63 and 72 are two highly inbred experimental chicken lines that differ greatly in susceptibility to Marek’s disease (MD)...
Genomic variation at the tips of the adaptive radiation of Darwin's finches.

PubMed

Chaves, Jaime A; Cooper, Elizabeth A; Hendry, Andrew P; Podos, Jeffrey; De León, Luis F; Raeymaekers, Joost A M; MacMillan, W Owen; Uy, J Albert C

2016-11-01

Adaptive radiation unfolds as selection acts on the genetic variation underlying functional traits. The nature of this variation can be revealed by studying the tips of an ongoing adaptive radiation. We studied genomic variation at the tips of the Darwin's finch radiation; specifically focusing on polymorphism within, and variation among, three sympatric species of the genus Geospiza. Using restriction site-associated DNA (RAD-seq), we characterized 32 569 single-nucleotide polymorphisms (SNPs), from which 11 outlier SNPs for beak and body size were uncovered by a genomewide association study (GWAS). Principal component analysis revealed that these 11 SNPs formed four statistically linked groups. Stepwise regression then revealed that the first PC score, which included 6 of the 11 top SNPs, explained over 80% of the variation in beak size, suggesting that selection on these traits influences multiple correlated loci. The two SNPs most strongly associated with beak size were near genes associated with beak morphology across deeper branches of the radiation: delta-like 1 homologue (DLK1) and high-mobility group AT-hook 2 (HMGA2). Our results suggest that (i) key adaptive traits are associated with a small fraction of the genome (11 of 32 569 SNPs), (ii) SNPs linked to the candidate genes are dispersed throughout the genome (on several chromosomes), and (iii) micro- and macro-evolutionary variation (roots and tips of the radiation) involve some shared and some unique genomic regions. © 2016 John Wiley & Sons Ltd.
An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes

PubMed Central

Cho, Yun Sung; Kim, Hyunho; Kim, Hak-Min; Jho, Sungwoong; Jun, JeHoon; Lee, Yong Joo; Chae, Kyun Shik; Kim, Chang Geun; Kim, Sangsoo; Eriksson, Anders; Edwards, Jeremy S.; Lee, Semin; Kim, Byung Chul; Manica, Andrea; Oh, Tae-Kwang; Church, George M.; Bhak, Jong

2016-01-01

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity. PMID:27882922
Genome-wide genetic variation and comparison of fruit-associated traits between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina).

PubMed

Liu, Tian-Jia; Li, Yong-Ping; Zhou, Jing-Jing; Hu, Chun-Gen; Zhang, Jin-Zhi

2018-03-01

The comprehensive genetic variation of two citrus species were analyzed at genome and transcriptome level. A total of 1090 differentially expressed genes were found during fruit development by RNA-sequencing. Fruit size (fruit equatorial diameter) and weight (fresh weight) are the two most important components determining yield and consumer acceptability for many horticultural crops. However, little is known about the genetic control of these traits. Here, we performed whole-genome resequencing to reveal the comprehensive genetic variation of the fruit development between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina). In total, 5,865,235 single-nucleotide polymorphisms (SNPs) and 414,447 insertions/deletions (InDels) were identified in the two citrus species. Based on integrative analysis of genome and transcriptome of fruit, 640,801 SNPs and 20,733 InDels were identified. The features, genomic distribution, functional effect, and other characteristics of these genetic variations were explored. RNA-sequencing identified 1090 differentially expressed genes (DEGs) during fruit development of kumquat and Clementine mandarin. Gene Ontology revealed that these genes were involved in various molecular functional and biological processes. In addition, the genetic variation of 939 DEGs and 74 multiple fruit development pathway genes from previous reports were also identified. A global survey identified 24,237 specific alternative splicing events in the two citrus species and showed that intron retention is the most prevalent pattern of alternative splicing. These genome variation data provide a foundation for further exploration of citrus diversity and gene-phenotype relationships and for future research on molecular breeding to improve kumquat, Clementine mandarin and related species.
Racial stereotypes and interracial attraction: phenotypic prototypicality and perceived attractiveness of Asians.

PubMed

Wilkins, Clara L; Chan, Joy F; Kaiser, Cheryl R

2011-10-01

What does it take to find a member of a different race attractive? In this research, we suggest that for Whites, attraction to Asians may be based, in part, on stereotypes and variations in Asians' racial appearance. Study 1 reveals that Asians are stereotyped as being more feminine and less masculine than other racial groups-characteristics considered appealing for women but not for men to possess. Study 2 examines how variation in racial appearance, phenotypic prototypicality (PP), shapes the degree to which Asians are gender stereotyped and how PP relates to perceptions of attractiveness. Higher PP Asian men are perceived as being less masculine and less physically attractive than lower PP Asian men. These findings inform theory on how within-group variation in racial appearance affects stereotyping and other social outcomes.
Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

PubMed

Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

2015-01-01

Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.
Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements

PubMed Central

Szitenberg, Amir; Cha, Soyeon; Opperman, Charles H.; Bird, David M.; Blaxter, Mark L.; Lunt, David H.

2016-01-01

Abstract Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host’s genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes. PMID:27566762
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.

PubMed

Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M

2015-10-01

The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

PubMed

Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

2014-11-29

Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
The Organelle Genomes of Hassawi Rice (Oryza sativa L.) and Its Hybrid in Saudi Arabia: Genome Variation, Rearrangement, and Origins

PubMed Central

Zhang, Tongwu; Hu, Songnian; Zhang, Guangyu; Pan, Linlin; Zhang, Xiaowei; Al-Mssallem, Ibrahim S.; Yu, Jun

2012-01-01

Hassawi rice (Oryza sativa L.) is a landrace adapted to the climate of Saudi Arabia, characterized by its strong resistance to soil salinity and drought. Using high quality sequencing reads extracted from raw data of a whole genome sequencing project, we assembled both chloroplast (cp) and mitochondrial (mt) genomes of the wild-type Hassawi rice (Hassawi-1) and its dwarf hybrid (Hassawi-2). We discovered 16 InDels (insertions and deletions) but no SNP (single nucleotide polymorphism) is present between the two Hassawi cp genomes. We identified 48 InDels and 26 SNPs in the two Hassawi mt genomes and a new type of sequence variation, termed reverse complementary variation (RCV) in the rice cp genomes. There are two and four RCVs identified in Hassawi-1 when compared to 93–11 (indica) and Nipponbare (japonica), respectively. Microsatellite sequence analysis showed there are more SSRs in the genic regions of both cp and mt genomes in the Hassawi rice than in the other rice varieties. There are also large repeats in the Hassawi mt genomes, with the longest length of 96,168 bp and 96,165 bp in Hassawi-1 and Hassawi-2, respectively. We believe that frequent DNA rearrangement in the Hassawi mt and cp genomes indicate ongoing dynamic processes to reach genetic stability under strong environmental pressures. Based on sequence variation analysis and the breeding history, we suggest that both Hassawi-1 and Hassawi-2 originated from the Indonesian variety Peta since genetic diversity between the two Hassawi cultivars is very low albeit an unknown historic origin of the wild-type Hassawi rice. PMID:22870184
Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

PubMed

Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

2016-03-22

The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Fast Dissemination of New HIV-1 CRF02/A1 Recombinants in Pakistan

PubMed Central

Chen, Yue; Hora, Bhavna; DeMarco, Todd; Shah, Sharaf Ali; Ahmed, Manzoor; Sanchez, Ana M.; Su, Chang; Carter, Meredith; Stone, Mars; Hasan, Rumina; Hasan, Zahra; Busch, Michael P.; Denny, Thomas N.; Gao, Feng

2016-01-01

A number of HIV-1 subtypes are identified in Pakistan by characterization of partial viral gene sequences. Little is known whether new recombinants are generated and how they disseminate since whole genome sequences for these viruses have not been characterized. Near full-length genome (NFLG) sequences were obtained by amplifying two overlapping half genomes or next generation sequencing from 34 HIV-1-infected individuals in Pakistan. Phylogenetic tree analysis showed that the newly characterized sequences were 16 subtype As, one subtype C, and 17 A/G recombinants. Further analysis showed that all 16 subtype A1 sequences (47%), together with the vast majority of sequences from Pakistan from other studies, formed a tight subcluster (A1a) within the subtype A1 clade, suggesting that they were derived from a single introduction. More in-depth analysis of 17 A/G NFLG sequences showed that five shared similar recombination breakpoints as in CRF02 (15%) but were phylogenetically distinct from the prototype CRF02 by forming a tight subcluster (CRF02a) while 12 (38%) were new recombinants between CRF02a and A1a or a divergent A1b viruses. Unique recombination patterns among the majority of the newly characterized recombinants indicated ongoing recombination. Interestingly, recombination breakpoints in these CRF02/A1 recombinants were similar to those in prototype CRF02 viruses, indicating that recombination at these sites more likely generate variable recombinant viruses. The dominance and fast dissemination of new CRF02a/A1 recombinants over prototype CRF02 suggest that these recombinant have more adapted and may become major epidemic strains in Pakistan. PMID:27973597
A 1000 Arab genome project to study the Emirati population.

PubMed

Al-Ali, Mariam; Osman, Wael; Tay, Guan K; AlSafar, Habiba S

2018-04-01

Discoveries from the human genome, HapMap, and 1000 genome projects have collectively contributed toward the creation of a catalog of human genetic variations that has improved our understanding of human diversity. Despite the collegial nature of many of these genome study consortiums, which has led to the cataloging of genetic variations of different ethnic groups from around the world, genome data on the Arab population remains overwhelmingly underrepresented. The National Arab Genome project in the United Arab Emirates (UAE) aims to address this deficiency by using Next Generation Sequencing (NGS) technology to provide data to improve our understanding of the Arab genome and catalog variants that are unique to the Arab population of the UAE. The project was conceived to shed light on the similarities and differences between the Arab genome and those of the other ethnic groups.
PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

PubMed

Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

2011-01-01

PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Genomic regions controlling shape variation in the first upper molar of the house mouse

PubMed Central

Pantalacci, Sophie; Turner, Leslie M; Steingrimsson, Eirikur; Renaud, Sabrina

2017-01-01

Numerous loci of large effect have been shown to underlie phenotypic variation between species. However, loci with subtle effects are presumably more frequently involved in microevolutionary processes but have rarely been discovered. We explore the genetic basis of shape variation in the first upper molar of hybrid mice between Mus musculus musculus and M. m. domesticus. We performed the first genome-wide association study for molar shape and used 3D surface morphometrics to quantify subtle variation between individuals. We show that many loci of small effect underlie phenotypic variation, and identify five genomic regions associated with tooth shape; one region contained the gene microphthalmia-associated transcription factor Mitf that has previously been associated with tooth malformations. Using a panel of five mutant laboratory strains, we show the effect of the Mitf gene on tooth shape. This is the first report of a gene causing subtle but consistent variation in tooth shape resembling variation in nature. PMID:29091026
Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome

PubMed Central

Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing; O’Connor, Timothy D.; Abecasis, Gonçalo R.; Wojcik, Genevieve L; Gignoux, Christopher R.; Gourraud, Pierre-Antoine; Lizee, Antoine; Hansen, Mark; Genuario, Rob; Bullis, Dave; Lawley, Cindy; Kenny, Eimear E.; Bustamante, Carlos; Beaty, Terri H.; Mathias, Rasika A.; Barnes, Kathleen C.; Qin, Zhaohui S.; Preethi Boorgula, Meher; Campbell, Monica; Chavan, Sameer; Ford, Jean G.; Foster, Cassandra; Gao, Li; Hansel, Nadia N.; Horowitz, Edward; Huang, Lili; Ortiz, Romina; Potee, Joseph; Rafaels, Nicholas; Ruczinski, Ingo; Scott, Alan F.; Taub, Margaret A.; Vergara, Candelaria; Levin, Albert M.; Padhukasahasram, Badri; Williams, L. Keoki; Dunston, Georgia M.; Faruque, Mezbah U.; Gietzen, Kimberly; Deshpande, Aniket; Grus, Wendy E.; Locke, Devin P.; Foreman, Marilyn G.; Avila, Pedro C.; Grammer, Leslie; Kim, Kwang-Youn A.; Kumar, Rajesh; Schleimer, Robert; De La Vega, Francisco M.; Shringarpure, Suyash S.; Musharoff, Shaila; Burchard, Esteban G.; Eng, Celeste; Hernandez, Ryan D.; Pino-Yanes, Maria; Torgerson, Dara G.; Szpiech, Zachary A.; Torres, Raul; Nicolae, Dan L.; Ober, Carole; Olopade, Christopher O; Olopade, Olufunmilayo; Oluwole, Oluwafemi; Arinola, Ganiyu; Song, Wei; Correa, Adolfo; Musani, Solomon; Wilson, James G.; Lange, Leslie A.; Akey, Joshua; Bamshad, Michael; Chong, Jessica; Fu, Wenqing; Nickerson, Deborah; Reiner, Alexander; Hartert, Tina; Ware, Lorraine B.; Bleecker, Eugene; Meyers, Deborah; Ortega, Victor E.; Maul, Pissamai; Maul, Trevor; Watson, Harold; Ilma Araujo, Maria; Riccio Oliveira, Ricardo; Caraballo, Luis; Marrugo, Javier; Martinez, Beatriz; Meza, Catherine; Ayestas, Gerardo; Francisco Herrera-Paz, Edwin; Landaverde-Torres, Pamela; Erazo, Said Omar Leiva; Martinez, Rosella; Mayorga, Alvaro; Mayorga, Luis F.; Mejia-Mejia, Delmy-Aracely; Ramos, Hector; Saenz, Allan; Varela, Gloria; Marina Vasquez, Olga; Ferguson, Trevor; Knight-Madden, Jennifer; Samms-Vaughan, Maureen; Wilks, Rainford J.; Adegnika, Akim; Ateba-Ngoa, Ulysse; Yazdanbakhsh, Maria

2017-01-01

A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an ‘African Diaspora Power Chip’ (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry. PMID:28429804
[The human variome project and its progress].

PubMed

Gao, Shan; Zhang, Ning; Zhang, Lei; Duan, Guang-You; Zhang, Tao

2010-11-01

The main goal of post genomics is to explain how the genome, the map of which has been constructed in the Human Genome Project, affacts activities of life. This leads to generate multiple "omics": structural genomics, functional genomics, proteomics, metabonomics, et al. In Jun. 2006, Melbourne, Australia, Human Genome Variation Society (HGVS) initiated the Human Variome Project (HVP) to collect all the sequence variation and polymorphism data worldwidely. HVP is to search and determine those mutations related with human diseases by association study between genetype and phenotype on the scale of genome level and other methods. Those results will be translated into clinical application. Considering the potential effects of this project on human health, this paper introduced its origin and main content in detail and discussed its meaning and prospect.

Comparative Genomics in Homo sapiens.

PubMed

Oti, Martin; Sammeth, Michael

2018-01-01

Genomes can be compared at different levels of divergence, either between species or within species. Within species genomes can be compared between different subpopulations, such as human subpopulations from different continents. Investigating the genomic differences between different human subpopulations is important when studying complex diseases that are affected by many genetic variants, as the variants involved can differ between populations. The 1000 Genomes Project collected genome-scale variation data for 2504 human individuals from 26 different populations, enabling a systematic comparison of variation between human subpopulations. In this chapter, we present step-by-step a basic protocol for the identification of population-specific variants employing the 1000 Genomes data. These variants are subsequently further investigated for those that affect the proteome or RNA splice sites, to investigate potentially biologically relevant differences between the populations.
Population genomics reveals a candidate gene involved in bumble bee pigmentation.

PubMed

Pimsler, Meaghan L; Jackson, Jason M; Lozier, Jeffrey D

2017-05-01

Variation in bumble bee color patterns is well-documented within and between species. Identifying the genetic mechanisms underlying such variation may be useful in revealing evolutionary forces shaping rapid phenotypic diversification. The widespread North American species Bombus bifarius exhibits regional variation in abdominal color forms, ranging from red-banded to black-banded phenotypes and including geographically and phenotypically intermediate forms. Identifying genomic regions linked to this variation has been complicated by strong, near species level, genome-wide differentiation between red- and black-banded forms. Here, we instead focus on the closely related black-banded and intermediate forms that both belong to the subspecies B. bifarius nearcticus . We analyze an RNA sequencing (RNAseq) data set and identify a cluster of single nucleotide polymorphisms (SNPs) within one gene, Xanthine dehydrogenase/oxidase -like, that exhibit highly unusual differentiation compared to the rest of the sequenced genome. Homologs of this gene contribute to pigmentation in other insects, and results thus represent a strong candidate for investigating the genetic basis of pigment variation in B. bifarius and other bumble bee mimicry complexes.
Genome Editing of Structural Variations: Modeling and Gene Correction.

PubMed

Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

2016-07-01

The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. Copyright © 2016 Elsevier Ltd. All rights reserved.
Optimization of a yeast RNA interference system for controlling gene expression and enabling rapid metabolic engineering.

PubMed

Crook, Nathan C; Schmitz, Alexander C; Alper, Hal S

2014-05-16

Reduction of endogenous gene expression is a fundamental operation of metabolic engineering, yet current methods for gene knockdown (i.e., genome editing) remain laborious and slow, especially in yeast. In contrast, RNA interference allows facile and tunable gene knockdown via a simple plasmid transformation step, enabling metabolic engineers to rapidly prototype knockdown strategies in multiple strains before expending significant cost to undertake genome editing. Although RNAi is naturally present in a myriad of eukaryotes, it has only been recently implemented in Saccharomyces cerevisiae as a heterologous pathway and so has not yet been optimized as a metabolic engineering tool. In this study, we elucidate a set of design principles for the construction of hairpin RNA expression cassettes in yeast and implement RNA interference to quickly identify routes for improvement of itaconic acid production in this organism. The approach developed here enables rapid prototyping of knockdown strategies and thus accelerates and reduces the cost of the design-build-test cycle in yeast.
Muju Virus, Harbored by Myodes regulus in Korea, Might Represent a Genetic Variant of Puumala Virus, the Prototype Arvicolid Rodent-Borne Hantavirus

PubMed Central

Lee, Jin Goo; Gu, Se Hun; Baek, Luck Ju; Shin, Ok Sarah; Park, Kwang Sook; Kim, Heung-Chul; Klein, Terry A.; Yanagihara, Richard; Song, Jin-Won

2014-01-01

The genome of Muju virus (MUJV), identified originally in the royal vole (Myodes regulus) in Korea, was fully sequenced to ascertain its genetic and phylogenetic relationship with Puumala virus (PUUV), harbored by the bank vole (My. glareolus), and a PUUV-like virus, named Hokkaido virus (HOKV), in the grey red-backed vole (My. rufocanus) in Japan. Whole genome sequence analysis of the 6544-nucleotide large (L), 3652-nucleotide medium (M) and 1831-nucleotide small (S) segments of MUJV, as well as the amino acid sequences of their gene products, indicated that MUJV strains from different capture sites might represent genetic variants of PUUV, the prototype arvicolid rodent-borne hantavirus in Europe. Distinct geographic-specific clustering of MUJV was found in different provinces in Korea, and phylogenetic analyses revealed that MUJV and HOKV share a common ancestry with PUUV. A better understanding of the taxonomic classification and pathogenic potential of MUJV must await its isolation in cell culture. PMID:24736214
Natural Allelic Variations in Highly Polyploidy Saccharum Complex

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.

Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Natural Allelic Variations in Highly Polyploidy Saccharum Complex

DOE PAGES

Song, Jian; Yang, Xiping; Resende, Jr., Marcio F. R.; ...

2016-06-08

Sugarcane ( Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designedmore » based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWAmem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. Furthermore, the target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.« less
Salmonella Typhi genomics: envisaging the future of typhoid eradication.

PubMed

Yap, Kien-Pong; Thong, Kwai Lin

2017-08-01

Next-generation whole-genome sequencing has revolutionised the study of infectious diseases in recent years. The availability of genome sequences and its understanding have transformed the field of molecular microbiology, epidemiology, infection treatments and vaccine developments. We review the key findings of the publicly accessible genomes of Salmonella enterica serovar Typhi since the first complete genome to the most recent release of thousands of Salmonella Typhi genomes, which remarkably shape the genomic research of S. Typhi and other pathogens. Important new insights acquired from the genome sequencing of S. Typhi, pertaining to genomic variations, evolution, population structure, antibiotic resistance, virulence, pathogenesis, disease surveillance/investigation and disease control are discussed. As the numbers of sequenced genomes are increasing at an unprecedented rate, fine variations in the gene pool of S. Typhi are captured in high resolution, allowing deeper understanding of the pathogen's evolutionary trends and its pathogenesis, paving the way to bringing us closer to eradication of typhoid through effective vaccine/treatment development. © 2017 John Wiley & Sons Ltd.
A dictionary based informational genome analysis

PubMed Central

2012-01-01

Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Genomic correlates of recombination rate and its variability across eight recombination maps in the western honey bee (Apis mellifera L.).

PubMed

Ross, Caitlin R; DeFelice, Dominick S; Hunt, Greg J; Ihle, Kate E; Amdam, Gro V; Rueppell, Olav

2015-02-21

Meiotic recombination has traditionally been explained based on the structural requirement to stabilize homologous chromosome pairs to ensure their proper meiotic segregation. Competing hypotheses seek to explain the emerging findings of significant heterogeneity in recombination rates within and between genomes, but intraspecific comparisons of genome-wide recombination patterns are rare. The honey bee (Apis mellifera) exhibits the highest rate of genomic recombination among multicellular animals with about five cross-over events per chromatid. Here, we present a comparative analysis of recombination rates across eight genetic linkage maps of the honey bee genome to investigate which genomic sequence features are correlated with recombination rate and with its variation across the eight data sets, ranging in average marker spacing ranging from 1 Mbp to 120 kbp. Overall, we found that GC content explained best the variation in local recombination rate along chromosomes at the analyzed 100 kbp scale. In contrast, variation among the different maps was correlated to the abundance of microsatellites and several specific tri- and tetra-nucleotides. The combined evidence from eight medium-scale recombination maps of the honey bee genome suggests that recombination rate variation in this highly recombining genome might be due to the DNA configuration instead of distinct sequence motifs. However, more fine-scale analyses are needed. The empirical basis of eight differing genetic maps allowed for robust conclusions about the correlates of the local recombination rates and enabled the study of the relation between DNA features and variability in local recombination rates, which is particularly relevant in the honey bee genome with its exceptionally high recombination rate.
Dissection of complex adult traits in a mouse synthetic population.

PubMed

Burke, David T; Kozloff, Kenneth M; Chen, Shu; West, Joshua L; Wilkowski, Jodi M; Goldstein, Steven A; Miller, Richard A; Galecki, Andrzej T

2012-08-01

Finding the causative genetic variations that underlie complex adult traits is a significant experimental challenge. The unbiased search strategy of genome-wide association (GWAS) has been used extensively in recent human population studies. These efforts, however, typically find only a minor fraction of the genetic loci that are predicted to affect variation. As an experimental model for the analysis of adult polygenic traits, we measured a mouse population for multiple phenotypes and conducted a genome-wide search for effector loci. Complex adult phenotypes, related to body size and bone structure, were measured as component phenotypes, and each subphenotype was associated with a genomic spectrum of candidate effector loci. The strategy successfully detected several loci for the phenotypes, at genome-wide significance, using a single, modest-sized population (N = 505). The effector loci each explain 2%-10% of the measured trait variation and, taken together, the loci can account for over 25% of a trait's total population variation. A replicate population (N = 378) was used to confirm initially observed loci for one trait (femur length), and, when the two groups were merged, the combined population demonstrated increased power to detect loci. In contrast to human population studies, our mouse genome-wide searches find loci that individually explain a larger fraction of the observed variation. Also, the additive effects of our detected mouse loci more closely match the predicted genetic component of variation. The genetic loci discovered are logical candidates for components of the genetic networks having evolutionary conservation with human biology.
Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics

PubMed Central

Flight, Patrick A.; Rand, David M.

2012-01-01

Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (FST ∼ 5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An FST outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. FST values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic FST for mtDNA. The majority of FST outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal. PMID:22767487
Concept Acquisition in Children with Mild Intellectual Disability: Factors Affecting the Abstraction of Prototypical Information.

ERIC Educational Resources Information Center

Hayes, Brett K.; Conway, Robert N.

2000-01-01

A study investigated effects of variations in the number of instances comprising a category on concept acquisition by 31 children (ages 9-14) with mild intellectual disability and 19 controls. Intellectual disability had little effect on ability to abstract a category prototype but did reduce use of exemplar-specific information for recognition.…
Genome-wide single nucleotide polymorphisms reveal population history and adaptive divergence in wild guppies.

PubMed

Willing, Eva-Maria; Bentzen, Paul; van Oosterhout, Cock; Hoffmann, Margarete; Cable, Joanne; Breden, Felix; Weigel, Detlef; Dreyer, Christine

2010-03-01

Adaptation of guppies (Poecilia reticulata) to contrasting upland and lowland habitats has been extensively studied with respect to behaviour, morphology and life history traits. Yet population history has not been studied at the whole-genome level. Although single nucleotide polymorphisms (SNPs) are the most abundant form of variation in many genomes and consequently very informative for a genome-wide picture of standing natural variation in populations, genome-wide SNP data are rarely available for wild vertebrates. Here we use genetically mapped SNP markers to comprehensively survey genetic variation within and among naturally occurring guppy populations from a wide geographic range in Trinidad and Venezuela. Results from three different clustering methods, Neighbor-net, principal component analysis (PCA) and Bayesian analysis show that the population substructure agrees with geographic separation and largely with previously hypothesized patterns of historical colonization. Within major drainages (Caroni, Oropouche and Northern), populations are genetically similar, but those in different geographic regions are highly divergent from one another, with some indications of ancient shared polymorphisms. Clear genomic signatures of a previous introduction experiment were seen, and we detected additional potential admixture events. Headwater populations were significantly less heterozygous than downstream populations. Pairwise F(ST) values revealed marked differences in allele frequencies among populations from different regions, and also among populations within the same region. F(ST) outlier methods indicated some regions of the genome as being under directional selection. Overall, this study demonstrates the power of a genome-wide SNP data set to inform for studies on natural variation, adaptation and evolution of wild populations.
Genomic characteristics of cattle copy number variations

USDA-ARS?s Scientific Manuscript database

We performed a systematic analysis of cattle copy number variations (CNVs) using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the trio information, we identified 682 candidate CNV regions (CNVR...
Genomic Copy Number Variation in Disorders of Cognitive Development

ERIC Educational Resources Information Center

Morrow, Eric M.

2010-01-01

Objective: To highlight recent discoveries in the area of genomic copy number variation in neuropsychiatric disorders including intellectual disability, autism, and schizophrenia. To emphasize new principles emerging from this area, involving the genetic architecture of disease, pathophysiology, and diagnosis. Method: Review of studies published…
Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species

PubMed Central

Wang, Jing; Street, Nathaniel R.; Scofield, Douglas G.; Ingvarsson, Pär K.

2016-01-01

A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. PMID:26721855
Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species.

PubMed

Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K

2016-03-01

A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
Whole-Genome Sequence Variation among Multiple Isolates of Pseudomonas aeruginosa

PubMed Central

Spencer, David H.; Kas, Arnold; Smith, Eric E.; Raymond, Christopher K.; Sims, Elizabeth H.; Hastings, Michele; Burns, Jane L.; Kaul, Rajinder; Olson, Maynard V.

2003-01-01

Whole-genome shotgun sequencing was used to study the sequence variation of three Pseudomonas aeruginosa isolates, two from clonal infections of cystic fibrosis patients and one from an aquatic environment, relative to the genomic sequence of reference strain PAO1. The majority of the PAO1 genome is represented in these strains; however, at least three prominent islands of PAO1-specific sequence are apparent. Conversely, ∼10% of the sequencing reads derived from each isolate fail to align with the PAO1 backbone. While average sequence variation among all strains is roughly 0.5%, regions of pronounced differences were evident in whole-genome scans of nucleotide diversity. We analyzed two such divergent loci, the pyoverdine and O-antigen biosynthesis regions, by complete resequencing. A thorough analysis of isolates collected over time from one of the cystic fibrosis patients revealed independent mutations resulting in the loss of O-antigen synthesis alternating with a mucoid phenotype. Overall, we conclude that most of the PAO1 genome represents a core P. aeruginosa backbone sequence while the strains addressed in this study possess additional genetic material that accounts for at least 10% of their genomes. Approximately half of these additional sequences are novel. PMID:12562802
Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation.

PubMed

Fitzpatrick, Matthew C; Keller, Stephen R

2015-01-01

Local adaptation is a central feature of most species occupying spatially heterogeneous environments, and may factor critically in responses to environmental change. However, most efforts to model the response of species to climate change ignore intraspecific variation due to local adaptation. Here, we present a new perspective on spatial modelling of organism-environment relationships that combines genomic data and community-level modelling to develop scenarios regarding the geographic distribution of genomic variation in response to environmental change. Rather than modelling species within communities, we use these techniques to model large numbers of loci across genomes. Using balsam poplar (Populus balsamifera) as a case study, we demonstrate how our framework can accommodate nonlinear responses of loci to environmental gradients. We identify a threshold response to temperature in the circadian clock gene GIGANTEA-5 (GI5), suggesting that this gene has experienced strong local adaptation to temperature. We also demonstrate how these methods can map ecological adaptation from genomic data, including the identification of predicted differences in the genetic composition of populations under current and future climates. Community-level modelling of genomic variation represents an important advance in landscape genomics and spatial modelling of biodiversity that moves beyond species-level assessments of climate change vulnerability. © 2014 John Wiley & Sons Ltd/CNRS.

Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

PubMed

Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

2015-01-01

We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.
Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

PubMed

Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

2016-01-01

Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. Copyright © 2016. Published by Elsevier SAS.
Consensus pan-genome assembly of the specialised wine bacterium Oenococcus oeni.

PubMed

Sternes, Peter R; Borneman, Anthony R

2016-04-27

Oenococcus oeni is a lactic acid bacterium that is specialised for growth in the ecological niche of wine, where it is noted for its ability to perform the secondary, malolactic fermentation that is often required for many types of wine. Expanding the understanding of strain-dependent genetic variations in its small and streamlined genome is important for realising its full potential in industrial fermentation processes. Whole genome comparison was performed on 191 strains of O. oeni; from this rich source of genomic information consensus pan-genome assemblies of the invariant (core) and variable (flexible) regions of this organism were established. Genetic variation in amino acid biosynthesis and sugar transport and utilisation was found to be common between strains. Furthermore, we characterised previously-unreported intra-specific genetic variations in the natural competence of this microbe. By assembling a consensus pan-genome from a large number of strains, this study provides a tool for researchers to readily compare protein-coding genes across strains and infer functional relationships between genes in conserved syntenic regions. This establishes a foundation for further genetic, and thus phenotypic, research of this industrially-important species.
PopHuman: the human population genomics browser

PubMed Central

Mulet, Roger; Villegas-Mirón, Pablo; Hervas, Sergi; Sanz, Esteve; Velasco, Daniel; Bertranpetit, Jaume; Laayouni, Hafid

2018-01-01

Abstract The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat. PMID:29059408
Copy Number Variations in Tilapia Genomes.

PubMed

Li, Bi Jun; Li, Hong Lian; Meng, Zining; Zhang, Yong; Lin, Haoran; Yue, Gen Hua; Xia, Jun Hong

2017-02-01

Discovering the nature and pattern of genome variation is fundamental in understanding phenotypic diversity among populations. Although several millions of single nucleotide polymorphisms (SNPs) have been discovered in tilapia, the genome-wide characterization of larger structural variants, such as copy number variation (CNV) regions has not been carried out yet. We conducted a genome-wide scan for CNVs in 47 individuals from three tilapia populations. Based on 254 Gb of high-quality paired-end sequencing reads, we identified 4642 distinct high-confidence CNVs. These CNVs account for 1.9% (12.411 Mb) of the used Nile tilapia reference genome. A total of 1100 predicted CNVs were found overlapping with exon regions of protein genes. Further association analysis based on linear model regression found 85 CNVs ranging between 300 and 27,000 base pairs significantly associated to population types (R 2 > 0.9 and P > 0.001). Our study sheds first insights on genome-wide CNVs in tilapia. These CNVs among and within tilapia populations may have functional effects on phenotypes and specific adaptation to particular environments.
Genome comparison of two Magnaporthe oryzae field isolates reveals genome variations and potential virulence effectors

PubMed Central

2013-01-01

Background Rice blast caused by the fungus Magnaporthe oryzae is an important disease in virtually every rice growing region of the world, which leads to significant annual decreases of grain quality and yield. To prevent disease, resistance genes in rice have been cloned and introduced into susceptible cultivars. However, introduced resistance can often be broken within few years of release, often due to mutation of cognate avirulence genes in fungal field populations. Results To better understand the pattern of mutation of M. oryzae field isolates under natural selection forces, we used a next generation sequencing approach to analyze the genomes of two field isolates FJ81278 and HN19311, as well as the transcriptome of FJ81278. By comparing the de novo genome assemblies of the two isolates against the finished reference strain 70–15, we identified extensive polymorphisms including unique genes, SNPs (single nucleotide polymorphism) and indels, structural variations, copy number variations, and loci under strong positive selection. The 1.75 MB of isolate-specific genome content carrying 118 novel genes from FJ81278, and 0.83 MB from HN19311 were also identified. By analyzing secreted proteins carrying polymorphisms, in total 256 candidate virulence effectors were found and 6 were chosen for functional characterization. Conclusions We provide results from genome comparison analysis showing extensive genome variation, and generated a list of M. oryzae candidate virulence effectors for functional characterization. PMID:24341723
Plasmodium copy number variation scan: gene copy numbers evaluation in haploid genomes.

PubMed

Beghain, Johann; Langlois, Anne-Claire; Legrand, Eric; Grange, Laura; Khim, Nimol; Witkowski, Benoit; Duru, Valentine; Ma, Laurence; Bouchier, Christiane; Ménard, Didier; Paul, Richard E; Ariey, Frédéric

2016-04-12

In eukaryotic genomes, deletion or amplification rates have been estimated to be a thousand more frequent than single nucleotide variation. In Plasmodium falciparum, relatively few transcription factors have been identified, and the regulation of transcription is seemingly largely influenced by gene amplification events. Thus copy number variation (CNV) is a major mechanism enabling parasite genomes to adapt to new environmental changes. Currently, the detection of CNVs is based on quantitative PCR (qPCR), which is significantly limited by the relatively small number of genes that can be analysed at any one time. Technological advances that facilitate whole-genome sequencing, such as next generation sequencing (NGS) enable deeper analyses of the genomic variation to be performed. Because the characteristics of Plasmodium CNVs need special consideration in algorithms and strategies for which classical CNV detection programs are not suited a dedicated algorithm to detect CNVs across the entire exome of P. falciparum was developed. This algorithm is based on a custom read depth strategy through NGS data and called PlasmoCNVScan. The analysis of CNV identification on three genes known to have different levels of amplification and which are located either in the nuclear, apicoplast or mitochondrial genomes is presented. The results are correlated with the qPCR experiments, usually used for identification of locus specific amplification/deletion. This tool will facilitate the study of P. falciparum genomic adaptation in response to ecological changes: drug pressure, decreased transmission, reduction of the parasite population size (transition to pre-elimination endemic area).
Copy number variation identification and analysis of the chicken genome using a 60K SNP BeadChip.

PubMed

Rao, Y S; Li, J; Zhang, R; Lin, X R; Xu, J G; Xie, L; Xu, Z Q; Wang, L; Gan, J K; Xie, X J; He, J; Zhang, X Q

2016-08-01

Copy number variation (CNV) is an important source of genetic variation in organisms and a main factor that affects phenotypic variation. A comprehensive study of chicken CNV can provide valuable information on genetic diversity and facilitate future analyses of associations between CNV and economically important traits in chickens. In the present study, an F2 full-sib chicken population (554 individuals), established from a cross between Xinghua and White Recessive Rock chickens, was used to explore CNV in the chicken genome. Genotyping was performed using a chicken 60K SNP BeadChip. A total of 1,875 CNV were detected with the PennCNV algorithm, and the average number of CNV was 3.42 per individual. The CNV were distributed across 383 independent CNV regions (CNVR) and covered 41 megabases (3.97%) of the chicken genome. Seven CNVR in 108 individuals were validated by quantitative real-time PCR, and 81 of these individuals (75%) also were detected with the PennCNV algorithm. In total, 274 CNVR (71.54%) identified in the current study were previously reported. Of these, 147 (38.38%) were reported in at least 2 studies. Additionally, 109 of the CNVR (28.46%) discovered here are novel. A total of 709 genes within or overlapping with the CNVR was retrieved. Out of the 2,742 quantitative trait loci (QTL) collected in the chicken QTL database, 43 QTL had confidence intervals overlapping with the CNVR, and 32 CNVR encompassed one or more functional genes. The functional genes located in the CNVR are likely to be the QTG that are associated with underlying economic traits. This study considerably expands our insight into the structural variation in the genome of chickens and provides an important resource for genomic variation, especially for genomic structural variation related to economic traits in chickens. © 2016 Poultry Science Association Inc.
Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

PubMed

Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C

2012-12-12

The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population studies. Despite the conservation of the core genes, the mitochondrial genomes of Aspergillus and Penicillium species examined here exhibit significant amount of interspecies variation. Most of this variation can be attributed to accessory genes and mobile introns, presumably acquired by horizontal gene transfer of mitochondrial plasmids and intron homing.
Landscape community genomics: understanding eco-evolutionary processes in complex environments

USGS Publications Warehouse

Hand, Brian K.; Lowe, Winsor H.; Kovach, Ryan P.; Muhlfeld, Clint C.; Luikart, Gordon

2015-01-01

Extrinsic factors influencing evolutionary processes are often categorically lumped into interactions that are environmentally (e.g., climate, landscape) or community-driven, with little consideration of the overlap or influence of one on the other. However, genomic variation is strongly influenced by complex and dynamic interactions between environmental and community effects. Failure to consider both effects on evolutionary dynamics simultaneously can lead to incomplete, spurious, or erroneous conclusions about the mechanisms driving genomic variation. We highlight the need for a landscape community genomics (LCG) framework to help to motivate and challenge scientists in diverse fields to consider a more holistic, interdisciplinary perspective on the genomic evolution of multi-species communities in complex environments.
Evolutionary and Taxonomic Implications of Variation in Nuclear Genome Size: Lesson from the Grass Genus Anthoxanthum (Poaceae)

PubMed Central

Chumová, Zuzana; Krejčíková, Jana; Mandáková, Terezie; Suda, Jan; Trávníček, Pavel

2015-01-01

The genus Anthoxanthum (sweet vernal grass, Poaceae) represents a taxonomically intricate polyploid complex with large phenotypic variation and its evolutionary relationships still poorly resolved. In order to get insight into the geographic distribution of ploidy levels and assess the taxonomic value of genome size data, we determined C- and Cx-values in 628 plants representing all currently recognized European species collected from 197 populations in 29 European countries. The flow cytometric estimates were supplemented by conventional chromosome counts. In addition to diploids, we found two low (rare 3x and common 4x) and one high (~16x–18x) polyploid levels. Mean holoploid genome sizes ranged from 5.52 pg in diploid A. alpinum to 44.75 pg in highly polyploid A. amarum, while the size of monoploid genomes ranged from 2.75 pg in tetraploid A. alpinum to 9.19 pg in diploid A. gracile. In contrast to Central and Northern Europe, which harboured only limited cytological variation, a much more complex pattern of genome sizes was revealed in the Mediterranean, particularly in Corsica. Eight taxonomic groups that partly corresponded to traditionally recognized species were delimited based on genome size values and phenotypic variation. Whereas our data supported the merger of A. aristatum and A. ovatum, eastern Mediterranean populations traditionally referred to as diploid A. odoratum were shown to be cytologically distinct, and may represent a new taxon. Autopolyploid origin was suggested for 4x A. alpinum. In contrast, 4x A. odoratum seems to be an allopolyploid, based on the amounts of nuclear DNA. Intraspecific variation in genome size was observed in all recognized species, the most striking example being the A. aristatum/ovatum complex. Altogether, our study showed that genome size can be a useful taxonomic marker in Anthoxathum to not only guide taxonomic decisions but also help resolve evolutionary relationships in this challenging grass genus. PMID:26207824
Evolutionary and Taxonomic Implications of Variation in Nuclear Genome Size: Lesson from the Grass Genus Anthoxanthum (Poaceae).

PubMed

Chumová, Zuzana; Krejčíková, Jana; Mandáková, Terezie; Suda, Jan; Trávníček, Pavel

2015-01-01

The genus Anthoxanthum (sweet vernal grass, Poaceae) represents a taxonomically intricate polyploid complex with large phenotypic variation and its evolutionary relationships still poorly resolved. In order to get insight into the geographic distribution of ploidy levels and assess the taxonomic value of genome size data, we determined C- and Cx-values in 628 plants representing all currently recognized European species collected from 197 populations in 29 European countries. The flow cytometric estimates were supplemented by conventional chromosome counts. In addition to diploids, we found two low (rare 3x and common 4x) and one high (~16x-18x) polyploid levels. Mean holoploid genome sizes ranged from 5.52 pg in diploid A. alpinum to 44.75 pg in highly polyploid A. amarum, while the size of monoploid genomes ranged from 2.75 pg in tetraploid A. alpinum to 9.19 pg in diploid A. gracile. In contrast to Central and Northern Europe, which harboured only limited cytological variation, a much more complex pattern of genome sizes was revealed in the Mediterranean, particularly in Corsica. Eight taxonomic groups that partly corresponded to traditionally recognized species were delimited based on genome size values and phenotypic variation. Whereas our data supported the merger of A. aristatum and A. ovatum, eastern Mediterranean populations traditionally referred to as diploid A. odoratum were shown to be cytologically distinct, and may represent a new taxon. Autopolyploid origin was suggested for 4x A. alpinum. In contrast, 4x A. odoratum seems to be an allopolyploid, based on the amounts of nuclear DNA. Intraspecific variation in genome size was observed in all recognized species, the most striking example being the A. aristatum/ovatum complex. Altogether, our study showed that genome size can be a useful taxonomic marker in Anthoxathum to not only guide taxonomic decisions but also help resolve evolutionary relationships in this challenging grass genus.
Relating Human Genetic Variation to Variation in Drug Responses

PubMed Central

Madian, Ashraf G.; Wheeler, Heather E.; Jones, Richard Baker; Dolan, M. Eileen

2012-01-01

Although sequencing a single human genome was a monumental effort a decade ago, more than one thousand genomes have now been sequenced. The task ahead lies in transforming this information into personalized treatment strategies that are tailored to the unique genetics of each individual. One important aspect of personalized medicine is patient-to-patient variation in drug response. Pharmacogenomics addresses this issue by seeking to identify genetic contributors to human variation in drug efficacy and toxicity. Here, we present a summary of the current status of this field, which has evolved from studies of single candidate genes to comprehensive genome-wide analyses. Additionally, we discuss the major challenges in translating this knowledge into a systems-level understanding of drug physiology with the ultimate goal of developing more effective personalized clinical treatment strategies. PMID:22840197
Genetical genomics of Populus leaf shape variation

DOE PAGES

Drost, Derek R.; Puranik, Swati; Novaes, Evandro; ...

2015-06-30

Leaf morphology varies extensively among plant species and is under strong genetic control. Mutagenic screens in model systems have identified genes and established molecular mechanisms regulating leaf initiation, development, and shape. However, it is not known whether this diversity across plant species is related to naturally occurring variation at these genes. Quantitative trait locus (QTL) analysis has revealed a polygenic control for leaf shape variation in different species suggesting that loci discovered by mutagenesis may only explain part of the naturally occurring variation in leaf shape. Here we undertook a genetical genomics study in a poplar intersectional pseudo-backcross pedigree tomore » identify genetic factors controlling leaf shape. Here, the approach combined QTL discovery in a genetic linkage map anchored to the Populus trichocarpa reference genome sequence and transcriptome analysis.« less
Genomic and evolutionary characteristics of cattle copy number variations

USDA-ARS?s Scientific Manuscript database

We performed a systematic analysis of cattle copy number variations (CNVs) using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the trio information, we identified 682 candidate CNV regions (CNVR...
Adaptive potential of genomic structural variation in human and mammalian evolution.

PubMed

Radke, David W; Lee, Charles

2015-09-01

Because phenotypic innovations must be genetically heritable for biological evolution to proceed, it is natural to consider new mutation events as well as standing genetic variation as sources for their birth. Previous research has identified a number of single-nucleotide polymorphisms that underlie a subset of adaptive traits in organisms. However, another well-known class of variation, genomic structural variation, could have even greater potential to produce adaptive phenotypes, due to the variety of possible types of alterations (deletions, insertions, duplications, among others) at different genomic positions and with variable lengths. It is from these dramatic genomic alterations, and selection on their phenotypic consequences, that adaptations leading to biological diversification could be derived. In this review, using studies in humans and other mammals, we highlight examples of how phenotypic variation from structural variants might become adaptive in populations and potentially enable biological diversification. Phenotypic change arising from structural variants will be described according to their immediate effect on organismal metabolic processes, immunological response and physical features. Study of population dynamics of segregating structural variation can therefore provide a window into understanding current and historical biological diversification. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Complexity of genetic mechanisms conferring nonuniformity of recombination in maize.

PubMed

Pan, Qingchun; Deng, Min; Yan, Jianbing; Li, Lin

2017-04-26

Recombinations occur nonuniformly across the maize genome. To dissect the genetic mechanisms underlying the nonuniformity of recombination, we performed quantitative trait locus (QTL) mapping using recombinant inbred line populations. Genome-wide QTL scan identified hundreds of QTLs with both cis-prone and trans- effects for recombination number variation. To provide detailed insights into cis- factors associated with recombination variation, we examined the genomic features around recombination hot regions, including density of genes, DNA transposons, retrotransposons, and some specific motifs. Compared to recombination variation in whole genome, more QTLs were mapped for variations in recombination hot regions. The majority QTLs for recombination hot regions are trans-QTLs and co-localized with genes from the recombination pathway. We also found that recombination variation was positively associated with the presence of genes and DNA transposons, but negatively related to the presence of long terminal repeat retrotransposons. Additionally, 41 recombination hot regions were fine-mapped. The high-resolution genotyping of five randomly selected regions in two F 2 populations verified that they indeed have ultra-high recombination frequency, which is even higher than that of the well-known recombination hot regions sh1-bz and a1-sh2. Taken together, our results further our understanding of recombination variation in plants.
Parallel or convergent evolution in human population genomic data revealed by genotype networks.

PubMed

R Vahdati, Ali; Wagner, Andreas

2016-08-02

Genotype networks are representations of genetic variation data that are complementary to phylogenetic trees. A genotype network is a graph whose nodes are genotypes (DNA sequences) with the same broadly defined phenotype. Two nodes are connected if they differ in some minimal way, e.g., in a single nucleotide. We analyze human genome variation data from the 1,000 genomes project, and construct haploid genotype (haplotype) networks for 12,235 protein coding genes. The structure of these networks varies widely among genes, indicating different patterns of variation despite a shared evolutionary history. We focus on those genes whose genotype networks show many cycles, which can indicate homoplasy, i.e., parallel or convergent evolution, on the sequence level. For 42 genes, the observed number of cycles is so large that it cannot be explained by either chance homoplasy or recombination. When analyzing possible explanations, we discovered evidence for positive selection in 21 of these genes and, in addition, a potential role for constrained variation and purifying selection. Balancing selection plays at most a small role. The 42 genes with excess cycles are enriched in functions related to immunity and response to pathogens. Genotype networks are representations of genetic variation data that can help understand unusual patterns of genomic variation.
Pan-Genomic Analysis Provides Insights into the Genomic Variation and Evolution of Salmonella Paratyphi A

PubMed Central

Chen, Chunxia; Cui, Xiaoying; Yu, Jun; Xiao, Jingfa; Kan, Biao

2012-01-01

Salmonella Paratyphi A (S. Paratyphi A) is a highly adapted, human-specific pathogen that causes paratyphoid fever. Cases of paratyphoid fever have recently been increasing, and the disease is becoming a major public health concern, especially in Eastern and Southern Asia. To investigate the genomic variation and evolution of S. Paratyphi A, a pan-genomic analysis was performed on five newly sequenced S. Paratyphi A strains and two other reference strains. A whole genome comparison revealed that the seven genomes are collinear and that their organization is highly conserved. The high rate of substitutions in part of the core genome indicates that there are frequent homologous recombination events. Based on the changes in the pan-genome size and cluster number (both in the core functional genes and core pseudogenes), it can be inferred that the sharply increasing number of pseudogene clusters may have strong correlation with the inactivation of functional genes, and indicates that the S. Paratyphi A genome is being degraded. PMID:23028950
Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.

2011-02-01

Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

Single-cell copy number variation detection

PubMed Central

2011-01-01

Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data. PMID:21854607
Clan Genomics and the Complex Architecture of Human Disease

PubMed Central

Belmont, John W.; Boerwinkle, Eric

2013-01-01

Human diseases are caused by alleles that encompass the full range of variant types, from single-nucleotide changes to copy-number variants, and these variations span a broad frequency spectrum, from the very rare to the common. The picture emerging from analysis of whole-genome sequences, the 1000 Genomes Project pilot studies, and targeted genomic sequencing derived from very large sample sizes reveals an abundance of rare and private variants. One implication of this realization is that recent mutation may have a greater influence on disease susceptibility or protection than is conferred by variations that arose in distant ancestors. PMID:21962505
Saccharomyces cerevisiae: gene annotation and genome variability, state of the art through comparative genomics.

PubMed

Louis, Ed

2011-01-01

In the early days of the yeast genome sequencing project, gene annotation was in its infancy and suffered the problem of many false positive annotations as well as missed genes. The lack of other sequences for comparison also prevented the annotation of conserved, functional sequences that were not coding. We are now in an era of comparative genomics where many closely related as well as more distantly related genomes are available for direct sequence and synteny comparisons allowing for more probable predictions of genes and other functional sequences due to conservation. We also have a plethora of functional genomics data which helps inform gene annotation for previously uncharacterised open reading frames (ORFs)/genes. For Saccharomyces cerevisiae this has resulted in a continuous updating of the gene and functional sequence annotations in the reference genome helping it retain its position as the best characterized eukaryotic organism's genome. A single reference genome for a species does not accurately describe the species and this is quite clear in the case of S. cerevisiae where the reference strain is not ideal for brewing or baking due to missing genes. Recent surveys of numerous isolates, from a variety of sources, using a variety of technologies have revealed a great deal of variation amongst isolates with genome sequence surveys providing information on novel genes, undetectable by other means. We now have a better understanding of the extant variation in S. cerevisiae as a species as well as some idea of how much we are missing from this understanding. As with gene annotation, comparative genomics enhances the discovery and description of genome variation and is providing us with the tools for understanding genome evolution, adaptation and selection, and underlying genetics of complex traits.
Within-Host Variations of Human Papillomavirus Reveal APOBEC Signature Mutagenesis in the Viral Genome.

PubMed

Hirose, Yusuke; Onuki, Mamiko; Tenjimbayashi, Yuri; Mori, Seiichiro; Ishii, Yoshiyuki; Takeuchi, Takamasa; Tasaka, Nobutaka; Satoh, Toyomi; Morisada, Tohru; Iwata, Takashi; Miyamoto, Shingo; Matsumoto, Koji; Sekizawa, Akihiko; Kukimoto, Iwao

2018-06-15

Persistent infection with oncogenic human papillomaviruses (HPVs) causes cervical cancer, accompanied by the accumulation of somatic mutations into the host genome. There are concomitant genetic changes in the HPV genome during viral infection; however, their relevance to cervical carcinogenesis is poorly understood. Here, we explored within-host genetic diversity of HPV by performing deep-sequencing analyses of viral whole-genome sequences in clinical specimens. The whole genomes of HPV types 16, 52, and 58 were amplified by type-specific PCR from total cellular DNA of cervical exfoliated cells collected from patients with cervical intraepithelial neoplasia (CIN) and invasive cervical cancer (ICC) and were deep sequenced. After constructing a reference viral genome sequence for each specimen, nucleotide positions showing changes with >0.5% frequencies compared to the reference sequence were determined for individual samples. In total, 1,052 positions of nucleotide variations were detected in HPV genomes from 151 samples (CIN1, n = 56; CIN2/3, n = 68; ICC, n = 27), with various numbers per sample. Overall, C-to-T and C-to-A substitutions were the dominant changes observed across all histological grades. While C-to-T transitions were predominantly detected in CIN1, their prevalence was decreased in CIN2/3 and fell below that of C-to-A transversions in ICC. Analysis of the trinucleotide context encompassing substituted bases revealed that TpCpN, a preferred target sequence for cellular APOBEC cytosine deaminases, was a primary site for C-to-T substitutions in the HPV genome. These results strongly imply that the APOBEC proteins are drivers of HPV genome mutation, particularly in CIN1 lesions. IMPORTANCE HPVs exhibit surprisingly high levels of genetic diversity, including a large repertoire of minor genomic variants in each viral genotype. Here, by conducting deep-sequencing analyses, we show for the first time a comprehensive snapshot of the within-host genetic diversity of high-risk HPVs during cervical carcinogenesis. Quasispecies harboring minor nucleotide variations in viral whole-genome sequences were extensively observed across different grades of CIN and cervical cancer. Among the within-host variations, C-to-T transitions, a characteristic change mediated by cellular APOBEC cytosine deaminases, were predominantly detected throughout the whole viral genome, most strikingly in low-grade CIN lesions. The results strongly suggest that within-host variations of the HPV genome are primarily generated through the interaction with host cell DNA-editing enzymes and that such within-host variability is an evolutionary source of the genetic diversity of HPVs. Copyright © 2018 American Society for Microbiology.
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Mapping and phasing of structural variation in patient genomes using nanopore sequencing.

PubMed

Cretu Stancu, Mircea; van Roosmalen, Markus J; Renkens, Ivo; Nieboer, Marleen M; Middelkamp, Sjors; de Ligt, Joep; Pregno, Giulia; Giachino, Daniela; Mandrile, Giorgia; Espejo Valle-Inclan, Jose; Korzelius, Jerome; de Bruijn, Ewart; Cuppen, Edwin; Talkowski, Michael E; Marschall, Tobias; de Ridder, Jeroen; Kloosterman, Wigard P

2017-11-06

Despite improvements in genomics technology, the detection of structural variants (SVs) from short-read sequencing still poses challenges, particularly for complex variation. Here we analyse the genomes of two patients with congenital abnormalities using the MinION nanopore sequencer and a novel computational pipeline-NanoSV. We demonstrate that nanopore long reads are superior to short reads with regard to detection of de novo chromothripsis rearrangements. The long reads also enable efficient phasing of genetic variations, which we leveraged to determine the parental origin of all de novo chromothripsis breakpoints and to resolve the structure of these complex rearrangements. Additionally, genome-wide surveillance of inherited SVs reveals novel variants, missed in short-read data sets, a large proportion of which are retrotransposon insertions. We provide a first exploration of patient genome sequencing with a nanopore sequencer and demonstrate the value of long-read sequencing in mapping and phasing of SVs for both clinical and research applications.
Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

PubMed Central

Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L

2006-01-01

Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
Interpretation of clinical relevance of X-chromosome copy number variations identified in a large cohort of individuals with cognitive disorders and/or congenital anomalies.

PubMed

Willemsen, Marjolein H; de Leeuw, Nicole; de Brouwer, Arjan P M; Pfundt, Rolph; Hehir-Kwa, Jayne Y; Yntema, Helger G; Nillesen, Willy M; de Vries, Bert B A; van Bokhoven, Hans; Kleefstra, Tjitske

2012-11-01

Genome-wide array studies are now routinely being used in the evaluation of patients with cognitive disorders (CD) and/or congenital anomalies (CA). Therefore, inevitably each clinician is confronted with the challenging task of the interpretation of copy number variations detected by genome-wide array platforms in a diagnostic setting. Clinical interpretation of autosomal copy number variations is already challenging, but assessment of the clinical relevance of copy number variations of the X-chromosome is even more complex. This study provides an overview of the X-Chromosome copy number variations that we have identified by genome-wide array analysis in a large cohort of 4407 male and female patients. We have made an interpretation of the clinical relevance of each of these copy number variations based on well-defined criteria and previous reports in literature and databases. The prevalence of X-chromosome copy number variations in this cohort was 57/4407 (∼1.3%), of which 15 (0.3%) were interpreted as (likely) pathogenic. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Genomic profiling of plastid DNA variation in the Mediterranean olive tree

PubMed Central

2011-01-01

Background Characterisation of plastid genome (or cpDNA) polymorphisms is commonly used for phylogeographic, population genetic and forensic analyses in plants, but detecting cpDNA variation is sometimes challenging, limiting the applications of such an approach. In the present study, we screened cpDNA polymorphism in the olive tree (Olea europaea L.) by sequencing the complete plastid genome of trees with a distinct cpDNA lineage. Our objective was to develop new markers for a rapid genomic profiling (by Multiplex PCRs) of cpDNA haplotypes in the Mediterranean olive tree. Results Eight complete cpDNA genomes of Olea were sequenced de novo. The nucleotide divergence between olive cpDNA lineages was low and not exceeding 0.07%. Based on these sequences, markers were developed for studying two single nucleotide substitutions and length polymorphism of 62 regions (with variable microsatellite motifs or other indels). They were then used to genotype the cpDNA variation in cultivated and wild Mediterranean olive trees (315 individuals). Forty polymorphic loci were detected on this sample, allowing the distinction of 22 haplotypes belonging to the three Mediterranean cpDNA lineages known as E1, E2 and E3. The discriminating power of cpDNA variation was particularly low for the cultivated olive tree with one predominating haplotype, but more diversity was detected in wild populations. Conclusions We propose a method for a rapid characterisation of the Mediterranean olive germplasm. The low variation in the cultivated olive tree indicated that the utility of cpDNA variation for forensic analyses is limited to rare haplotypes. In contrast, the high cpDNA variation in wild populations demonstrated that our markers may be useful for phylogeographic and populations genetic studies in O. europaea. PMID:21569271
An Approach for Integrating Toxicogenomic Data in Risk Assessment: The Dibutyl Phthalate Case Study

EPA Science Inventory

An approach for evaluating and integrating genomic data in chemical risk assessment was developed based on the lessons learned from performing a case study for the chemical dibutyl phthalate. A case study prototype approach was first developed in accordance with EPA guidance and ...
Large scale parallel pyrosequencing technology: PRRSV strain VR-2332 nsp2 deletion mutant stability in swine

USDA-ARS?s Scientific Manuscript database

Genomes from fifteen porcine reproductive and respiratory syndrome virus (PRRSV) isolates were derived simultaneously using 454 pyrosequencing technology. The viral isolates sequenced were from a recent swine study, in which engineered Type 2 prototype PRRSV strain VR-2332 mutants, with 87, 184, 200...
Copy number variation of individual cattle genomes using next-generation sequencing

USDA-ARS?s Scientific Manuscript database

Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one ...
Copy number variation of individual cattle genomes using next-generation sequencing

USDA-ARS?s Scientific Manuscript database

Copy Number Variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often difficult to track. Using a read depth approach based on next generation sequencing, we examined genome-wide copy number differences among five taurine (three Angu...
Helicos BioSciences.

PubMed

Milos, Patrice

2008-04-01

Helicos BioSciences Corporation is a life sciences company developing revolutionary new single molecule sequencing technology to provide the path to the US$1000 genome. True Single Molecule Sequencing (tSMS) will drive advancements in pharmacogenomics that can enable a better understanding of an individual's susceptibility to disease, develop more effective disease diagnoses and differentiate response to disease therapies. During 2007, genome-wide disease-association studies, the encylopedia of DNA elements (ENCODE) and the published genome sequence of two individuals have revealed human genome variation far more extensive than originally believed. These also demonstrated that common variations explain only a fraction of the genetic basis of disease. Therefore, the capability to understand an individual genome is critical in setting the foundation for the next great revolution in healthcare. Helicos is committed to this vision and will provide cost-effective genome sequencing and comprehensive analysis of the transcribed genome that can unlock the era of personalized healthcare.
The business value and cost-effectiveness of genomic medicine.

PubMed

Crawford, James M; Aspinall, Mara G

2012-05-01

Genomic medicine offers the promise of more effective diagnosis and treatment of human diseases. Genome sequencing early in the course of disease may enable more timely and informed intervention, with reduced healthcare costs and improved long-term outcomes. However, genomic medicine strains current models for demonstrating value, challenging efforts to achieve fair payment for services delivered, both for laboratory diagnostics and for use of molecular information in clinical management. Current models of healthcare reform stipulate that care must be delivered at equal or lower cost, with better patient and population outcomes. To achieve demonstrated value, genomic medicine must overcome many uncertainties: the clinical relevance of genomic variation; potential variation in technical performance and/or computational analysis; management of massive information sets; and must have available clinical interventions that can be informed by genomic analysis, so as to attain more favorable cost management of healthcare delivery and demonstrate improvements in cost-effectiveness.
Scanning the human genome at kilobase resolution.

PubMed

Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

2008-05-01

Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.
Genomic Variation of Inbreeding and Ancestry in the Remaining Two Isle Royale Wolves.

PubMed

Hedrick, Philip W; Kardos, Marty; Peterson, Rolf O; Vucetich, John A

2017-03-01

Inbreeding, relatedness, and ancestry have traditionally been estimated with pedigree information, however, molecular genomic data can provide more detailed examination of these properties. For example, pedigree information provides estimation of the expected value of these measures but molecular genomic data can estimate the realized values of these measures in individuals. Here, we generate the theoretical distribution of inbreeding, relatedness, and ancestry for the individuals in the pedigree of the Isle Royale wolves, the first examination of such variation in a wild population with a known pedigree. We use the 38 autosomes of the dog genome and their estimated map lengths in our genomic analysis. Although it is known that the remaining wolves are highly inbred, closely related, and descend from only 3 ancestors, our analyses suggest that there is significant variation in the realized inbreeding and relatedness around pedigree expectations. For example, the expected inbreeding in a hypothetical offspring from the 2 remaining wolves is 0.438 but the realized 95% genomic confidence interval is from 0.311 to 0.565. For individual chromosomes, a substantial proportion of the whole chromosomes are completely identical by descent. This examination provides a background to use when analyzing molecular genomic data for individual levels of inbreeding, relatedness, and ancestry. The level of variation in these measures is a function of the time to the common ancestor(s), the number of chromosomes, and the rate of recombination. In the Isle Royale wolf population, the few generations to a common ancestor results in the high variance in genomic inbreeding. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The landscape of inherited and de novo copy number variants in a plasmodium falciparum genetic cross

PubMed Central

2011-01-01

Background Copy number is a major source of genome variation with important evolutionary implications. Consequently, it is essential to determine copy number variant (CNV) behavior, distributions and frequencies across genomes to understand their origins in both evolutionary and generational time frames. We use comparative genomic hybridization (CGH) microarray and the resolution provided by a segregating population of cloned progeny lines of the malaria parasite, Plasmodium falciparum, to identify and analyze the inheritance of 170 genome-wide CNVs. Results We describe CNVs in progeny clones derived from both Mendelian (i.e. inherited) and non-Mendelian mechanisms. Forty-five CNVs were present in the parent lines and segregated in the progeny population. Furthermore, extensive variation that did not conform to strict Mendelian inheritance patterns was observed. 124 CNVs were called in one or more progeny but in neither parent: we observed CNVs in more than one progeny clone that were not identified in either parent, located more frequently in the telomeric-subtelomeric regions of chromosomes and singleton de novo CNVs distributed evenly throughout the genome. Linkage analysis of CNVs revealed dynamic copy number fluctuations and suggested mechanisms that could have generated them. Five of 12 previously identified expression quantitative trait loci (eQTL) hotspots coincide with CNVs, demonstrating the potential for broad influence of CNV on the transcriptional program and phenotypic variation. Conclusions CNVs are a significant source of segregating and de novo genome variation involving hundreds of genes. Examination of progeny genome segments provides a framework to assess the extent and possible origins of CNVs. This segregating genetic system reveals the breadth, distribution and dynamics of CNVs in a surprisingly plastic parasite genome, providing a new perspective on the sources of diversity in parasite populations. PMID:21936954
Genome re-sequencing reveals the history of apple and supports a two-stage model for fruit enlargement

USDA-ARS?s Scientific Manuscript database

Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road was proposed based on evidence from diverse genomic analyses. Cultiva...
Pathogenesis comparison between the United States porcine epidemic diarrhoea virus prototype and S-INDEL-variant strains in conventional neonatal piglets.

PubMed

Chen, Qi; Gauger, Phillip C; Stafne, Molly R; Thomas, Joseph T; Madson, Darin M; Huang, Haiyan; Zheng, Ying; Li, Ganwu; Zhang, Jianqiang

2016-05-01

At least two genetically different porcine epidemic diarrhoea virus (PEDV) strains have been identified in the USA: US PEDV prototype and S-INDEL-variant strains. The objective of this study was to compare the pathogenicity differences of the US PEDV prototype and S-INDEL-variant strains in conventional neonatal piglets under experimental infections. Fifty PEDV-negative 5-day-old pigs were divided into five groups of ten pigs each and were inoculated orogastrically with three US PEDV prototype isolates (IN19338/2013, NC35140/2013 and NC49469/2013), an S-INDEL-variant isolate (IL20697/2014), and virus-negative culture medium, respectively, with virus titres of 104 TCID50 ml- 1, 10 ml per pig. All three PEDV prototype isolates tested in this study, regardless of their phylogenetic clades, had similar pathogenicity and caused severe enteric disease in 5-day-old pigs as evidenced by clinical signs, faecal virus shedding, and gross and histopathological lesions. Compared with pigs inoculated with the three US PEDV prototype isolates, pigs inoculated with the S-INDEL-variant isolate had significantly diminished clinical signs, virus shedding in faeces, gross lesions in small intestines, caeca and colons, histopathological lesions in small intestines, and immunohistochemistry staining in ileum. However, the US PEDV prototype and the S-INDEL-variant strains induced similar viraemia levels in inoculated pigs. Whole genome sequences of the PEDV prototype and S-INDEL-variant strains were determined, but the molecular basis of virulence differences between these PEDV strains remains to be elucidated using a reverse genetics approach.

Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

PubMed

Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

2014-07-01

Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Biogeography of the Sulfolobus islandicus pan-genome

PubMed Central

Reno, Michael L.; Held, Nicole L.; Fields, Christopher J.; Burke, Patricia V.; Whitaker, Rachel J.

2009-01-01

Variation in gene content has been hypothesized to be the primary mode of adaptive evolution in microorganisms; however, very little is known about the spatial and temporal distribution of variable genes. Through population-scale comparative genomics of 7 Sulfolobus islandicus genomes from 3 locations, we demonstrate the biogeographical structure of the pan-genome of this species, with no evidence of gene flow between geographically isolated populations. The evolutionary independence of each population allowed us to assess genome dynamics over very recent evolutionary time, beginning ≈910,000 years ago. On this time scale, genome variation largely consists of recent strain-specific integration of mobile elements. Localized sectors of parallel gene loss are identified; however, the balance between the gain and loss of genetic material suggests that S. islandicus genomes acquire material slowly over time, primarily from closely related Sulfolobus species. Examination of the genome dynamics through population genomics in S. islandicus exposes the process of allopatric speciation in thermophilic Archaea and brings us closer to a generalized framework for understanding microbial genome evolution in a spatial context. PMID:19435847
Dynamics of genome size evolution in birds and mammals.

PubMed

Kapusta, Aurélie; Suh, Alexander; Feschotte, Cédric

2017-02-21

Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified "accordion" model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
Global DNA cytosine methylation as an evolving trait: phylogenetic signal and correlated evolution with genome size in angiosperms

PubMed Central

Alonso, Conchita; Pérez, Ricardo; Bazaga, Pilar; Herrera, Carlos M.

2015-01-01

DNA cytosine methylation is a widespread epigenetic mechanism in eukaryotes, and plant genomes commonly are densely methylated. Genomic methylation can be associated with functional consequences such as mutational events, genomic instability or altered gene expression, but little is known on interspecific variation in global cytosine methylation in plants. In this paper, we compare global cytosine methylation estimates obtained by HPLC and use a phylogenetically-informed analytical approach to test for significance of evolutionary signatures of this trait across 54 angiosperm species in 25 families. We evaluate whether interspecific variation in global cytosine methylation is statistically related to phylogenetic distance and also whether it is evolutionarily correlated with genome size (C-value). Global cytosine methylation varied widely between species, ranging between 5.3% (Arabidopsis) and 39.2% (Narcissus). Differences between species were related to their evolutionary trajectories, as denoted by the strong phylogenetic signal underlying interspecific variation. Global cytosine methylation and genome size were evolutionarily correlated, as revealed by the significant relationship between the corresponding phylogenetically independent contrasts. On average, a ten-fold increase in genome size entailed an increase of about 10% in global cytosine methylation. Results show that global cytosine methylation is an evolving trait in angiosperms whose evolutionary trajectory is significantly linked to changes in genome size, and suggest that the evolutionary implications of epigenetic mechanisms are likely to vary between plant lineages. PMID:25688257
Genomic structural variation contributes to phenotypic change of industrial bioethanol yeast Saccharomyces cerevisiae.

PubMed

Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong

2016-03-01

Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Blast2GO goes grid: developing a grid-enabled prototype for functional genomics analysis.

PubMed

Aparicio, G; Götz, S; Conesa, A; Segrelles, D; Blanquer, I; García, J M; Hernandez, V; Robles, M; Talon, M

2006-01-01

The vast amount in complexity of data generated in Genomic Research implies that new dedicated and powerful computational tools need to be developed to meet their analysis requirements. Blast2GO (B2G) is a bioinformatics tool for Gene Ontology-based DNA or protein sequence annotation and function-based data mining. The application has been developed with the aim of affering an easy-to-use tool for functional genomics research. Typical B2G users are middle size genomics labs carrying out sequencing, ETS and microarray projects, handling datasets up to several thousand sequences. In the current version of B2G. The power and analytical potential of both annotation and function data-mining is somehow restricted to the computational power behind each particular installation. In order to be able to offer the possibility of an enhanced computational capacity within this bioinformatics application, a Grid component is being developed. A prototype has been conceived for the particular problem of speeding up the Blast searches to obtain fast results for large datasets. Many efforts have been done in the literature concerning the speeding up of Blast searches, but few of them deal with the use of large heterogeneous production Grid Infrastructures. These are the infrastructures that could reach the largest number of resources and the best load balancing for data access. The Grid Service under development will analyse requests based on the number of sequences, splitting them accordingly to the available resources. Lower-level computation will be performed through MPIBLAST. The software architecture is based on the WSRF standard.
Rapid diversification of five Oryza AA genomes associated with rice adaptation.

PubMed

Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L; Gao, Li-Zhi

2014-11-18

Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm.
Rapid diversification of five Oryza AA genomes associated with rice adaptation

PubMed Central

Zhang, Qun-Jie; Zhu, Ting; Xia, En-Hua; Shi, Chao; Liu, Yun-Long; Zhang, Yun; Liu, Yuan; Jiang, Wen-Kai; Zhao, You-Jie; Mao, Shu-Yan; Zhang, Li-Ping; Huang, Hui; Jiao, Jun-Ying; Xu, Ping-Zhen; Yao, Qiu-Yang; Zeng, Fan-Chun; Yang, Li-Li; Gao, Ju; Tao, Da-Yun; Wang, Yue-Ju; Bennetzen, Jeffrey L.; Gao, Li-Zhi

2014-01-01

Comparative genomic analyses among closely related species can greatly enhance our understanding of plant gene and genome evolution. We report de novo-assembled AA-genome sequences for Oryza nivara, Oryza glaberrima, Oryza barthii, Oryza glumaepatula, and Oryza meridionalis. Our analyses reveal massive levels of genomic structural variation, including segmental duplication and rapid gene family turnover, with particularly high instability in defense-related genes. We show, on a genomic scale, how lineage-specific expansion or contraction of gene families has led to their morphological and reproductive diversification, thus enlightening the evolutionary process of speciation and adaptation. Despite strong purifying selective pressures on most Oryza genes, we documented a large number of positively selected genes, especially those genes involved in flower development, reproduction, and resistance-related processes. These diversifying genes are expected to have played key roles in adaptations to their ecological niches in Asia, South America, Africa and Australia. Extensive variation in noncoding RNA gene numbers, function enrichment, and rates of sequence divergence might also help account for the different genetic adaptations of these rice species. Collectively, these resources provide new opportunities for evolutionary genomics, numerous insights into recent speciation, a valuable database of functional variation for crop improvement, and tools for efficient conservation of wild rice germplasm. PMID:25368197
Mutational landscape of yeast mutator strains.

PubMed

Serero, Alexandre; Jubin, Claire; Loeillet, Sophie; Legoix-Né, Patricia; Nicolas, Alain G

2014-02-04

The acquisition of mutations is relevant to every aspect of genetics, including cancer and evolution of species on Darwinian selection. Genome variations arise from rare stochastic imperfections of cellular metabolism and deficiencies in maintenance genes. Here, we established the genome-wide spectrum of mutations that accumulate in a WT and in nine Saccharomyces cerevisiae mutator strains deficient for distinct genome maintenance processes: pol32Δ and rad27Δ (replication), msh2Δ (mismatch repair), tsa1Δ (oxidative stress), mre11Δ (recombination), mec1Δ tel1Δ (DNA damage/S-phase checkpoints), pif1Δ (maintenance of mitochondrial genome and telomere length), cac1Δ cac3Δ (nucleosome deposition), and clb5Δ (cell cycle progression). This study reveals the diversity, complexity, and ultimate unique nature of each mutational spectrum, composed of punctual mutations, chromosomal structural variations, and/or aneuploidies. The mutations produced in clb5Δ/CCNB1, mec1Δ/ATR, tel1Δ/ATM, and rad27Δ/FEN1 strains extensively reshape the genome, following a trajectory dependent on previous events. It comprises the transmission of unstable genomes that lead to colony mosaicisms. This comprehensive analytical approach of mutator defects provides a model to understand how genome variations might accumulate during clonal evolution of somatic cell populations, including tumor cells.
The diploid genome sequence of an Asian individual

PubMed Central

Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

2009-01-01

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
The African Genome Variation Project shapes medical genetics in Africa

PubMed Central

Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.

2014-01-01

Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

PubMed

Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming

2013-01-01

Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

PubMed Central

2013-01-01

Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169
Genomic analysis of local variation and recent evolution in Plasmodium vivax

PubMed Central

Pearson, Richard D; Miotto, Olivo; Almagro-Garcia, Jacob; Amaratunga, Chanaki; Suon, Seila; Mao, Sivanna; Noviyanti, Rintis; Trimarsanto, Hidayat; Marfurt, Jutta; Anstey, Nicholas M; William, Timothy; Boni, Maciej F; Dolecek, Christiane; Hien, Tinh Tran; White, Nicholas J; Michon, Pascal; Siba, Peter; Tavul, Livingstone; Harrison, Gabrielle; Barry, Alyssa; Mueller, Ivo; Ferreira, Marcelo U; Karunaweera, Nadira; Randrianarivelojosia, Milijaona; Gao, Qi; Hubbart, Christina; Hart, Lee; Jeffery, Ben; Drury, Eleanor; Mead, Daniel; Kekre, Mihir; Campino, Susana; Manske, Magnus; Cornelius, Victoria J; MacInnis, Bronwyn; Rockett, Kirk A; Miles, Alistair; Rayner, Julian C; Fairhurst, Rick M; Nosten, Francois; Price, Ric N; Kwiatkowski, Dominic P

2016-01-01

The widespread distribution and relapsing nature of Plasmodium vivax infection present major challenges for malaria elimination. To characterise the genetic diversity of this parasite within individual infections and across the population, we performed deep genome sequencing of >200 clinical samples collected across the Asia-Pacific region, and analysed data on >300,000 SNPs and 9 regions of the genome with large copy number variations. Individual infections showed complex patterns of genetic structure, with variation not only in the number of dominant clones but also in their level of relatedness and inbreeding. At the population level, we observed strong signals of recent evolutionary selection both in known drug resistance genes and at novel loci, and these varied markedly between geographical locations. These findings reveal a dynamic landscape of local evolutionary adaptation in P. vivax populations, and provide a foundation for genomic surveillance to guide effective strategies for control and elimination. PMID:27348299
Spatiotemporal Dynamics and Epistatic Interaction Sites in Dengue Virus Type 1: A Comprehensive Sequence-Based Analysis

PubMed Central

Chu, Pei-Yu; Ke, Guan-Ming; Chen, Po-Chih; Liu, Li-Teh; Tsai, Yen-Chun; Tsai, Jih-Jin

2013-01-01

The continuing threat of dengue fever necessitates a comprehensive characterisation of its epidemiological trends. Phylogenetic and recombination events were reconstructed based on 100 worldwide dengue virus (DENV) type 1 genome sequences with an outgroup (prototypes of DENV2-4). The phylodynamic characteristics and site-specific variation were then analysed using data without the outgroup. Five genotypes (GI-GV) and a ladder-like structure with short terminal branch topology were observed in this study. Apparently, the transmission of DENV1 was geographically random before gradual localising with human activity as GI-GIII in South Asia, GIV in the South Pacific, and GV in the Americas. Genotypes IV and V have recently shown higher population densities compared to older genotypes. All codon regions and all tree branches were skewed toward a negative selection, which indicated that their variation was restricted by protein function. Notably, multi-epistatic interaction sites were found in both PrM 221 and NS3 1730. Recombination events accumulated in regions E, NS3-NS4A, and particularly in region NS5. The estimated coevolution pattern also highlights the need for further study of the biological role of protein PrM 221 and NS3 1730. The recent transmission of emergent GV sublineages into Central America and Europe mandates closely monitoring of genotype interaction and succession. PMID:24040199
The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes.

PubMed

Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars

2017-02-10

The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
Comparative analyses across cattle breeds reveal the pitfalls caused by artificial and lineage-differential copy number variations

USDA-ARS?s Scientific Manuscript database

Copy number variations (CNV) are well known genomic variants, which often complicate structural and functional genomics studies. Here, we integrated the CNV region (CNVR) result detected from 1,682 Nellore cattle with the equivalent result derived from the Bovine HapMap samples. Through comparing CN...
GENOMICS SYMPOSIUM: Using genomic approaches to uncover sources of variation in age at puberty and reproductive longevity in sows

USDA-ARS?s Scientific Manuscript database

Genetic variants associated with traits such as age at puberty and litter size could provide insight into the underlying genetic sources of variation impacting sow reproductive longevity and productivity. Genomewide characterization and gene expression profiling were used using gilts from the Univer...
Genome-wide interactions with dairy intake for body mass index in adults of European descent

USDA-ARS?s Scientific Manuscript database

Body weight responds variably to the intake of dairy foods. Genetic variation may contribute to inter-individual variability in associations between body weight and dairy consumption. We conducted a genome-wide interaction study to discover genetic variants that account for variation in BMI in the c...
Genome-wide copy number variant analysis reveals variants associated with 10 diverse production traits in Holstein cattle

USDA-ARS?s Scientific Manuscript database

Copy number variation (CNV) is an important type of genetic variation contributing to phenotypic differences among mammals and may serve as an alternative molecular marker to single nucleotide polymorphism (SNP) for genome-wide association study (GWAS). Recently, GWAS analysis using CNV has been app...

Natural Variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gordon, Sean

2013-03-01

Sean Gordon of the USDA on Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions at the 8th Annual Genomics of Energy Environment Meeting on March 27, 2013 in Walnut Creek, CA.
PopHuman: the human population genomics browser.

PubMed

Casillas, Sònia; Mulet, Roger; Villegas-Mirón, Pablo; Hervas, Sergi; Sanz, Esteve; Velasco, Daniel; Bertranpetit, Jaume; Laayouni, Hafid; Barbadilla, Antonio

2018-01-04

The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A genome-wide detection of copy number variation using SNP genotyping arrays in Beijing-You chickens.

PubMed

Zhou, Wei; Liu, Ranran; Zhang, Jingjing; Zheng, Maiqing; Li, Peng; Chang, Guobin; Wen, Jie; Zhao, Guiping

2014-10-01

Copy number variation (CNV) has been recently examined in many species and is recognized as being a source of genetic variability, especially for disease-related phenotypes. In this study, the PennCNV software, a genome-wide CNV detection system based on the 60 K SNP BeadChip was used on a total sample size of 1,310 Beijing-You chickens (a Chinese local breed). After quality control, 137 high confidence CNVRs covering 27.31 Mb of the chicken genome and corresponding to 2.61 % of the whole chicken genome. Within these regions, 131 known genes or coding sequences were involved. Q-PCR was applied to verify some of the genes related to disease development. Results showed that copy number of genes such as, phosphatidylinositol-5-phosphate 4-kinase II alpha, PHD finger protein 14, RHACD8 (a CD8α- like messenger RNA), MHC B-G, zinc finger protein, sarcosine dehydrogenase and ficolin 2 varied between individual chickens, which also supports the reliability of chip-detection of the CNVs. As one source of genomic variation, CNVs may provide new insight into the relationship between the genome and phenotypic characteristics.
Genomic signatures of positive selection in humans and the limits of outlier approaches.

PubMed

Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

2006-08-01

Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Structural genomic variations and Parkinson's disease.

PubMed

Bandrés-Ciga, Sara; Ruz, Clara; Barrero, Francisco J; Escamilla-Sevilla, Francisco; Pelegrina, Javier; Vives, Francisco; Duran, Raquel

2017-10-01

Parkinson's disease (PD) is the second most common neurodegenerative disease, whose prevalence is projected to be between 8.7 and 9.3 million by 2030. Until about 20 years ago, PD was considered to be the textbook example of a "non-genetic" disorder. Nowadays, PD is generally considered a multifactorial disorder that arises from the combination and complex interaction of genes and environmental factors. To date, a total of 7 genes including SNCA, LRRK2, PARK2, DJ-1, PINK 1, VPS35 and ATP13A2 have been seen to cause unequivocally Mendelian PD. Also, variants with incomplete penetrance in the genes LRRK2 and GBA are considered to be strong risk factors for PD worldwide. Although genetic studies have provided valuable insights into the pathogenic mechanisms underlying PD, the role of structural variation in PD has been understudied in comparison with other genomic variations. Structural genomic variations might substantially account for such genetic substrates yet to be discovered. The present review aims to provide an overview of the structural genomic variants implicated in the pathogenesis of PD.
Panoptes: web-based exploration of large scale genome variation data.

PubMed

Vauterin, Paul; Jeffery, Ben; Miles, Alistair; Amato, Roberto; Hart, Lee; Wright, Ian; Kwiatkowski, Dominic

2017-10-15

The size and complexity of modern large-scale genome variation studies demand novel approaches for exploring and sharing the data. In order to unlock the potential of these data for a broad audience of scientists with various areas of expertise, a unified exploration framework is required that is accessible, coherent and user-friendly. Panoptes is an open-source software framework for collaborative visual exploration of large-scale genome variation data and associated metadata in a web browser. It relies on technology choices that allow it to operate in near real-time on very large datasets. It can be used to browse rich, hybrid content in a coherent way, and offers interactive visual analytics approaches to assist the exploration. We illustrate its application using genome variation data of Anopheles gambiae, Plasmodium falciparum and Plasmodium vivax. Freely available at https://github.com/cggh/panoptes, under the GNU Affero General Public License. paul.vauterin@gmail.com. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.

PubMed

Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

2018-01-01

Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.
SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells

PubMed Central

Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

2018-01-01

Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. PMID:29208629
Meiotic gene-conversion rate and tract length variation in the human genome.

PubMed

Padhukasahasram, Badri; Rannala, Bruce

2013-02-27

Meiotic recombination occurs in the form of two different mechanisms called crossing-over and gene-conversion and both processes have an important role in shaping genetic variation in populations. Although variation in crossing-over rates has been studied extensively using sperm-typing experiments, pedigree studies and population genetic approaches, our knowledge of variation in gene-conversion parameters (ie, rates and mean tract lengths) remains far from complete. To explore variability in population gene-conversion rates and its relationship to crossing-over rate variation patterns, we have developed and validated using coalescent simulations a comprehensive Bayesian full-likelihood method that can jointly infer crossing-over and gene-conversion rates as well as tract lengths from population genomic data under general variable rate models with recombination hotspots. Here, we apply this new method to SNP data from multiple human populations and attempt to characterize for the first time the fine-scale variation in gene-conversion parameters along the human genome. We find that the estimated ratio of gene-conversion to crossing-over rates varies considerably across genomic regions as well as between populations. However, there is a great degree of uncertainty associated with such estimates. We also find substantial evidence for variation in the mean conversion tract length. The estimated tract lengths did not show any negative relationship with the local heterozygosity levels in our analysis.European Journal of Human Genetics advance online publication, 27 February 2013; doi:10.1038/ejhg.2013.30.
Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum[OPEN

PubMed Central

Hardigan, Michael A.; Crisovan, Emily; Hamilton, John P.; Laimbeer, Parker; Leisner, Courtney P.; Manrique-Carpintero, Norma C.; Newton, Linsey; Pham, Gina M.; Vaillancourt, Brieanne; Zeng, Zixian; Jiang, Jiming

2016-01-01

Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes

PubMed Central

Deschamps, Matthieu; Laval, Guillaume; Fagny, Maud; Itan, Yuval; Abel, Laurent; Casanova, Jean-Laurent; Patin, Etienne; Quintana-Murci, Lluis

2016-01-01

Human genes governing innate immunity provide a valuable tool for the study of the selective pressure imposed by microorganisms on host genomes. A comprehensive, genome-wide study of how selective constraints and adaptations have driven the evolution of innate immunity genes is missing. Using full-genome sequence variation from the 1000 Genomes Project, we first show that innate immunity genes have globally evolved under stronger purifying selection than the remainder of protein-coding genes. We identify a gene set under the strongest selective constraints, mutations in which are likely to predispose individuals to life-threatening disease, as illustrated by STAT1 and TRAF3. We then evaluate the occurrence of local adaptation and detect 57 high-scoring signals of positive selection at innate immunity genes, variation in which has been associated with susceptibility to common infectious or autoimmune diseases. Furthermore, we show that most adaptations targeting coding variation have occurred in the last 6,000–13,000 years, the period at which populations shifted from hunting and gathering to farming. Finally, we show that innate immunity genes present higher Neandertal introgression than the remainder of the coding genome. Notably, among the genes presenting the highest Neandertal ancestry, we find the TLR6-TLR1-TLR10 cluster, which also contains functional adaptive variation in Europeans. This study identifies highly constrained genes that fulfill essential, non-redundant functions in host survival and reveals others that are more permissive to change—containing variation acquired from archaic hominins or adaptive variants in specific populations—improving our understanding of the relative biological importance of innate immunity pathways in natural conditions. PMID:26748513
Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

PubMed

Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

2016-01-01

Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.
Genomic Differentiation during Speciation-with-Gene-Flow: Comparing Geographic and Host-Related Variation in Divergent Life History Adaptation in Rhagoletis pomonella

PubMed Central

Hood, Glen R.; Meyers, Peter J.; Powell, Thomas H. Q.; Lazorchak, Peter; Glover, Mary M.; Tait, Cheyenne; Hahn, Daniel A.; Berlocher, Stewart H.; Smith, James J.; Nosil, Patrik; Feder, Jeffrey L.

2018-01-01

A major goal of evolutionary biology is to understand how variation within populations gets partitioned into differences between reproductively isolated species. Here, we examine the degree to which diapause life history timing, a critical adaptation promoting population divergence, explains geographic and host-related genetic variation in ancestral hawthorn and recently derived apple-infesting races of Rhagoletis pomonella. Our strategy involved combining experiments on two different aspects of diapause (initial diapause intensity and adult eclosion time) with a geographic survey of genomic variation across four sites where apple and hawthorn flies co-occur from north to south in the Midwestern USA. The results demonstrated that the majority of the genome showing significant geographic and host-related variation can be accounted for by initial diapause intensity and eclosion time. Local genomic differences between sympatric apple and hawthorn flies were subsumed within broader geographic clines; allele frequency differences within the races across the Midwest were two to three-fold greater than those between the races in sympatry. As a result, sympatric apple and hawthorn populations displayed more limited genomic clustering compared to geographic populations within the races. The findings suggest that with reduced gene flow and increased selection on diapause equivalent to that seen between geographic sites, the host races may be recognized as different genotypic entities in sympatry, and perhaps species, a hypothesis requiring future genomic analysis of related sibling species to R. pomonella to test. Our findings concerning the way selection and geography interplay could be of broad significance for many cases of earlier stages of divergence-with-gene flow, including (1) where only modest increases in geographic isolation and the strength of selection may greatly impact genetic coupling and (2) the dynamics of how spatial and temporal standing variation is extracted by selection to generate differences between new and discrete units of biodiversity. PMID:29783692
Genomic Differentiation during Speciation-with-Gene-Flow: Comparing Geographic and Host-Related Variation in Divergent Life History Adaptation in Rhagoletis pomonella.

PubMed

Doellman, Meredith M; Ragland, Gregory J; Hood, Glen R; Meyers, Peter J; Egan, Scott P; Powell, Thomas H Q; Lazorchak, Peter; Glover, Mary M; Tait, Cheyenne; Schuler, Hannes; Hahn, Daniel A; Berlocher, Stewart H; Smith, James J; Nosil, Patrik; Feder, Jeffrey L

2018-05-18

A major goal of evolutionary biology is to understand how variation within populations gets partitioned into differences between reproductively isolated species. Here, we examine the degree to which diapause life history timing, a critical adaptation promoting population divergence, explains geographic and host-related genetic variation in ancestral hawthorn and recently derived apple-infesting races of Rhagoletis pomonella . Our strategy involved combining experiments on two different aspects of diapause (initial diapause intensity and adult eclosion time) with a geographic survey of genomic variation across four sites where apple and hawthorn flies co-occur from north to south in the Midwestern USA. The results demonstrated that the majority of the genome showing significant geographic and host-related variation can be accounted for by initial diapause intensity and eclosion time. Local genomic differences between sympatric apple and hawthorn flies were subsumed within broader geographic clines; allele frequency differences within the races across the Midwest were two to three-fold greater than those between the races in sympatry. As a result, sympatric apple and hawthorn populations displayed more limited genomic clustering compared to geographic populations within the races. The findings suggest that with reduced gene flow and increased selection on diapause equivalent to that seen between geographic sites, the host races may be recognized as different genotypic entities in sympatry, and perhaps species, a hypothesis requiring future genomic analysis of related sibling species to R. pomonella to test. Our findings concerning the way selection and geography interplay could be of broad significance for many cases of earlier stages of divergence-with-gene flow, including (1) where only modest increases in geographic isolation and the strength of selection may greatly impact genetic coupling and (2) the dynamics of how spatial and temporal standing variation is extracted by selection to generate differences between new and discrete units of biodiversity.
Linking genotype to phenotype in a changing ocean: inferring the genomic architecture of a blue mussel stress response with genome-wide association.

PubMed

Kingston, S E; Martino, P; Melendy, M; Reed, F A; Carlon, D B

2018-03-01

A key component to understanding the evolutionary response to a changing climate is linking underlying genetic variation to phenotypic variation in stress response. Here, we use a genome-wide association approach (GWAS) to understand the genetic architecture of calcification rates under simulated climate stress. We take advantage of the genomic gradient across the blue mussel hybrid zone (Mytilus edulis and Mytilus trossulus) in the Gulf of Maine (GOM) to link genetic variation with variance in calcification rates in response to simulated climate change. Falling calcium carbonate saturation states are predicted to negatively impact many marine organisms that build calcium carbonate shells - like blue mussels. We sampled wild mussels and measured net calcification phenotypes after exposing mussels to a 'climate change' common garden, where we raised temperature by 3°C, decreased pH by 0.2 units and limited food supply by filtering out planktonic particles >5 μm, compared to ambient GOM conditions in the summer. This climate change exposure greatly increased phenotypic variation in net calcification rates compared to ambient conditions. We then used regression models to link the phenotypic variation with over 170 000 single nucleotide polymorphism loci (SNPs) generated by genotype by sequencing to identify genomic locations associated with calcification phenotype, and estimate heritability and architecture of the trait. We identified at least one of potentially 2-10 genomic regions responsible for 30% of the phenotypic variation in calcification rates that are potential targets of natural selection by climate change. Our simulations suggest a power of 13.7% with our study's average effective sample size of 118 individuals and rare alleles, but a power of >90% when effective sample size is 900. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Divergence with gene flow across a speciation continuum of Heliconius butterflies.

PubMed

Supple, Megan A; Papa, Riccardo; Hines, Heather M; McMillan, W Owen; Counterman, Brian A

2015-09-24

A key to understanding the origins of species is determining the evolutionary processes that drive the patterns of genomic divergence during speciation. New genomic technologies enable the study of high-resolution genomic patterns of divergence across natural speciation continua, where taxa pairs with different levels of reproductive isolation can be used as proxies for different stages of speciation. Empirical studies of these speciation continua can provide valuable insights into how genomes diverge during speciation. We examine variation across a handful of genomic regions in parapatric and allopatric populations of Heliconius butterflies with varying levels of reproductive isolation. Genome sequences were mapped to 2.2-Mb of the H. erato genome, including 1-Mb across the red color pattern locus and multiple regions unlinked to color pattern variation. Phylogenetic analyses reveal a speciation continuum of pairs of hybridizing races and incipient species in the Heliconius erato clade. Comparisons of hybridizing pairs of divergently colored races and incipient species reveal that genomic divergence increases with ecological and reproductive isolation, not only across the locus responsible for adaptive variation in red wing coloration, but also at genomic regions unlinked to color pattern. We observe high levels of divergence between the incipient species H. erato and H. himera, suggesting that divergence may accumulate early in the speciation process. Comparisons of genomic divergence between the incipient species and allopatric races suggest that limited gene flow cannot account for the observed high levels of divergence between the incipient species. Our results provide a reconstruction of the speciation continuum across the H. erato clade and provide insights into the processes that drive genomic divergence during speciation, establishing the H. erato clade as a powerful framework for the study of speciation.
Partial structure of the phylloxin gene from the giant monkey frog, Phyllomedusa bicolor: parallel cloning of precursor cDNA and genomic DNA from lyophilized skin secretion.

PubMed

Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris

2005-12-01

Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.
Complete Genome Sequence and Comparative Metabolic Profiling of the Prototypical Enteroaggregative Escherichia coli Strain 042

PubMed Central

Chaudhuri, Roy R.; Sebaihia, Mohammed; Hobman, Jon L.; Webber, Mark A.; Leyton, Denisse L.; Goldberg, Martin D.; Cunningham, Adam F.; Scott-Tucker, Anthony; Ferguson, Paul R.; Thomas, Christopher M.; Frankel, Gad; Tang, Christoph M.; Dudley, Edward G.; Roberts, Ian S.; Rasko, David A.; Pallen, Mark J.; Parkhill, Julian; Nataro, James P.; Thomson, Nicholas R.; Henderson, Ian R.

2010-01-01

Background Escherichia coli can experience a multifaceted life, in some cases acting as a commensal while in other cases causing intestinal and/or extraintestinal disease. Several studies suggest enteroaggregative E. coli are the predominant cause of E. coli-mediated diarrhea in the developed world and are second only to Campylobacter sp. as a cause of bacterial-mediated diarrhea. Furthermore, enteroaggregative E. coli are a predominant cause of persistent diarrhea in the developing world where infection has been associated with malnourishment and growth retardation. Methods In this study we determined the complete genomic sequence of E. coli 042, the prototypical member of the enteroaggregative E. coli, which has been shown to cause disease in volunteer studies. We performed genomic and phylogenetic comparisons with other E. coli strains revealing previously uncharacterised virulence factors including a variety of secreted proteins and a capsular polysaccharide biosynthetic locus. In addition, by using Biolog™ Phenotype Microarrays we have provided a full metabolic profiling of E. coli 042 and the non-pathogenic lab strain E. coli K-12. We have highlighted the genetic basis for many of the metabolic differences between E. coli 042 and E. coli K-12. Conclusion This study provides a genetic context for the vast amount of experimental and epidemiological data published thus far and provides a template for future diagnostic and intervention strategies. PMID:20098708
Prototypic chromatin insulator cHS4 protects retroviral transgene from silencing in Schistosoma mansoni

PubMed Central

Suttiprapa, Sutas; Rinaldi, Gabriel; Brindley, Paul J.

2011-01-01

Vesicular stomatitis virus glycoprotein (VSVG) pseudotyped murine leukemia virus (MLV) virions can transduce schistosomes, leading to chromosomal integration of reporter transgenes. To develop VSVG-MLV for functional genomics in schistosomes, the influence of the chicken β-globin cHS4 element, a prototypic chromatin insulator, on transgene expression was examined. Plasmid pLNHX encoding the MLV 5′- and 3′-Long Terminal Repeats (LTRs) flanking the neomycin phosphotransferase gene (neo) was modified to include, within the U3 region of the 3′-LTR, active components of cHS4 insulator, the 250 bp core fused to the 400 bp 3′-region. Cultured larvae of Schistosoma mansoni were transduced with virions from producer cells transfected with control or cHS4-bearing plasmids. Schistosomules transduced with cHS4 virions expressed two to 20 times higher levels of neo than controls, while carrying comparable numbers of integrated proviral transgenes. The findings not only demonstrated that cHS4 was active in schistosomes but also they represent the first report of activity of cHS4 in any Lophotrochozoan species, which has significant implications for evolutionary conservation of heterochromatin regulation. The findings advance prospects for transgenesis in functional genomics of the schistosome genome to discover intervention targets because they provide the means to enhance and extend transgene activity including for vector based RNA interference. PMID:21918820
Advances in Genetical Genomics of Plants

PubMed Central

Joosen, R.V.L.; Ligterink, W.; Hilhorst, H.W.M.; Keurentjes, J.J.B.

2009-01-01

Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research. PMID:20514216

Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping.

PubMed

Busby, Ben; Lesko, Matthew; Federer, Lisa

2016-01-01

In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.
DNA sequence of the lymphotropic variant of minute virus of mice, MVM(i), and comparison with the DNA sequence of the fibrotropic prototype strain.

PubMed

Astell, C R; Gardiner, E M; Tattersall, P

1986-02-01

The sequence of molecular clones of the genome of MVM(i), a lymphotropic variant of minute virus of mice, was determined and compared with that of MVM(p), the fibrotropic prototype strain. At the nucleotide level there are 163 base changes: 129 transitions and 34 transversions. Most nucleotide changes are silent, with only 27 amino acids changes predicted, of which 22 are conservative. Notable differences between the MVM(i) and MVM(p) genomes which may account for the cell specificities of these viruses occur within the 3' nontranslated regions. The differences discussed include the absence of a 65-base-pair direct in MVM(i), the presence of only two polyadenylation sites in MVM(i) compared with four in MVM(p), and sequences that bear a resemblance to enhancer sequences. Also included in this paper is an important correction to the MVM(p) sequence (C.R. Astell, M. Thomson, M. Merchlinsky, and D. C. Ward, Nucleic Acids Res. 11:999-1018, 1983).
Genetic sex determination in Astatotilapia calliptera, a prototype species for the Lake Malawi cichlid radiation.

PubMed

Peterson, Erin N; Cline, Maggie E; Moore, Emily C; Roberts, Natalie B; Roberts, Reade B

2017-06-01

East African cichlids display extensive variation in sex determination systems. The species Astatotilapia calliptera is one of the few cichlids that reside both in Lake Malawi and in surrounding waterways. A. calliptera is of interest in evolutionary studies as a putative immediate outgroup species for the Lake Malawi species flock and possibly as a prototype ancestor-like species for the radiation. Here, we use linkage mapping to test association of sex in A. calliptera with loci that have been previously associated with genetic sex determination in East African cichlid species. We identify a male heterogametic XY system segregating at linkage group (LG) 7 in an A. calliptera line that originated from Lake Malawi, at a locus previously shown to act as an XY sex determination system in multiple species of Lake Malawi cichlids. Significant association of genetic markers and sex produce a broad genetic interval of approximately 26 megabases (Mb) using the Nile tilapia genome to orient markers; however, we note that the marker with the strongest association with sex is near a gene that acts as a master sex determiner in other fish species. We demonstrate that alleles of the marker are perfectly associated with sex in Metriaclima mbenjii, a species from the rock-dwelling clade of Lake Malawi. While we do not rule out the possibility of other sex determination loci in A. calliptera, this study provides a foundation for fine mapping of the cichlid sex determination gene on LG7 and evolutionary context regarding the origin and persistence of the LG7 XY across diverse, rapidly evolving lineages.
Genetic sex determination in Astatotilapia calliptera, a prototype species for the Lake Malawi cichlid radiation

NASA Astrophysics Data System (ADS)

Peterson, Erin N.; Cline, Maggie E.; Moore, Emily C.; Roberts, Natalie B.; Roberts, Reade B.

2017-06-01

East African cichlids display extensive variation in sex determination systems. The species Astatotilapia calliptera is one of the few cichlids that reside both in Lake Malawi and in surrounding waterways. A. calliptera is of interest in evolutionary studies as a putative immediate outgroup species for the Lake Malawi species flock and possibly as a prototype ancestor-like species for the radiation. Here, we use linkage mapping to test association of sex in A. calliptera with loci that have been previously associated with genetic sex determination in East African cichlid species. We identify a male heterogametic XY system segregating at linkage group (LG) 7 in an A. calliptera line that originated from Lake Malawi, at a locus previously shown to act as an XY sex determination system in multiple species of Lake Malawi cichlids. Significant association of genetic markers and sex produce a broad genetic interval of approximately 26 megabases (Mb) using the Nile tilapia genome to orient markers; however, we note that the marker with the strongest association with sex is near a gene that acts as a master sex determiner in other fish species. We demonstrate that alleles of the marker are perfectly associated with sex in Metriaclima mbenjii, a species from the rock-dwelling clade of Lake Malawi. While we do not rule out the possibility of other sex determination loci in A. calliptera, this study provides a foundation for fine mapping of the cichlid sex determination gene on LG7 and evolutionary context regarding the origin and persistence of the LG7 XY across diverse, rapidly evolving lineages.
Genomic Analysis of Natural Variation for Seed and Plant Size in Maize (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

ScienceCinema

Kaeppler, Shawn

2018-02-01

Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, CA.
Comparative ruminant genomics highlights segmental duplication and mobile element insertion diversity

USDA-ARS?s Scientific Manuscript database

We have expanded upon a previously reported comparative genomics approach using a read-depth (JaRMs) and a hybrid read-pair, split-read (RAPTR-SV) copy number variation (CNV) detection method that uses read alignments to the cattle reference genome in order to identify species-specific genomic rearr...
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop.

PubMed

Hazzouri, Khaled M; Flowers, Jonathan M; Visser, Hendrik J; Khierallah, Hussam S M; Rosas, Ulises; Pham, Gina M; Meyer, Rachel S; Johansen, Caryn K; Fresquez, Zoë A; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A; Thirkhill, Deborah; Markhand, Ghulam S; Krueger, Robert R; Zaid, Abdelouahhab; Purugganan, Michael D

2015-11-09

Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop.
Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop

PubMed Central

Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.

2015-01-01

Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859
Concerted copy number variation balances ribosomal DNA dosage in human and mouse genomes

PubMed Central

Gibbons, John G.; Branco, Alan T.; Godinho, Susana A.; Yu, Shoukai; Lemos, Bernardo

2015-01-01

Tandemly repeated ribosomal DNA (rDNA) arrays are among the most evolutionary dynamic loci of eukaryotic genomes. The loci code for essential cellular components, yet exhibit extensive copy number (CN) variation within and between species. CN might be partly determined by the requirement of dosage balance between the 5S and 45S rDNA arrays. The arrays are nonhomologous, physically unlinked in mammals, and encode functionally interdependent RNA components of the ribosome. Here we show that the 5S and 45S rDNA arrays exhibit concerted CN variation (cCNV). Despite 5S and 45S rDNA elements residing on different chromosomes and lacking sequence similarity, cCNV between these loci is strong, evolutionarily conserved in humans and mice, and manifested across individual genotypes in natural populations and pedigrees. Finally, we observe that bisphenol A induces rapid and parallel modulation of 5S and 45S rDNA CN. Our observations reveal a novel mode of genome variation, indicate that natural selection contributed to the evolution and conservation of cCNV, and support the hypothesis that 5S CN is partly determined by the requirement of dosage balance with the 45S rDNA array. We suggest that human disease variation might be traced to disrupted rDNA dosage balance in the genome. PMID:25583482
Overlap in genomic variation associated with milk fat composition in Holstein Friesian and Dutch native dual-purpose breeds.

PubMed

Maurice-Van Eijndhoven, M H T; Bovenhuis, H; Veerkamp, R F; Calus, M P L

2015-09-01

The aim of this study was to identify if genomic variations associated with fatty acid (FA) composition are similar between the Holstein-Friesian (HF) and native dual-purpose breeds used in the Dutch dairy industry. Phenotypic and genotypic information were available for the breeds Meuse-Rhine-Yssel (MRY), Dutch Friesian (DF), Groningen White Headed (GWH), and HF. First, the reliability of genomic breeding values of the native Dutch dual-purpose cattle breeds MRY, DF, and GWH was evaluated using single nucleotide polymorphism (SNP) effects estimated in HF, including all SNP or subsets with stronger associations in HF. Second, the genomic variation of the regions associated with FA composition in HF (regions on Bos taurus autosome 5, 14, and 26), were studied in the different breeds. Finally, similarities in genotype and allele frequencies between MRY, DF, GWH, and HF breeds were assessed for specific regions associated with FA composition. On average across the traits, the highest reliabilities of genomic prediction were estimated for GWH (0.158) and DF (0.116) when the 8 to 22 SNP with the strongest association in HF were included. With the same set of SNP, GEBV for MRY were the least reliable (0.022). This indicates that on average only 2 (MRY) to 16% (GWH) of the genomic variation in HF is shared with the native Dutch dual-purpose breeds. The comparison of predicted variances of different regions associated with milk and milk fat composition showed that breeds clearly differed in genomic variation within these regions. Finally, the correlations of allele frequencies between breeds across the 8 to 22 SNP with the strongest association in HF were around 0.8 between the Dutch native dual-purpose breeds, whereas the correlations between the native breeds and HF were clearly lower and around 0.5. There was no consistent relationship between the reliabilities of genomic prediction for a specific breed and the correlation between the allele frequencies of this breed and HF. In conclusion, most of the genomic variation associated with FA composition in the Dutch dual-purpose breeds appears to be breed-specific. Furthermore, the minor allele frequencies of genes having an effect on the milk FA composition in HF were shown to be much smaller in the breeds MRY, DF, and GWH, especially for the MRY breed. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Personalized biochemistry and biophysics.

PubMed

Kroncke, Brett M; Vanoye, Carlos G; Meiler, Jens; George, Alfred L; Sanders, Charles R

2015-04-28

Whole human genome sequencing of individuals is becoming rapid and inexpensive, enabling new strategies for using personal genome information to help diagnose, treat, and even prevent human disorders for which genetic variations are causative or are known to be risk factors. Many of the exploding number of newly discovered genetic variations alter the structure, function, dynamics, stability, and/or interactions of specific proteins and RNA molecules. Accordingly, there are a host of opportunities for biochemists and biophysicists to participate in (1) developing tools to allow accurate and sometimes medically actionable assessment of the potential pathogenicity of individual variations and (2) establishing the mechanistic linkage between pathogenic variations and their physiological consequences, providing a rational basis for treatment or preventive care. In this review, we provide an overview of these opportunities and their associated challenges in light of the current status of genomic science and personalized medicine, the latter often termed precision medicine.
Personalized Biochemistry and Biophysics

PubMed Central

2016-01-01

Whole human genome sequencing of individuals is becoming rapid and inexpensive, enabling new strategies for using personal genome information to help diagnose, treat, and even prevent human disorders for which genetic variations are causative or are known to be risk factors. Many of the exploding number of newly discovered genetic variations alter the structure, function, dynamics, stability, and/or interactions of specific proteins and RNA molecules. Accordingly, there are a host of opportunities for biochemists and biophysicists to participate in (1) developing tools to allow accurate and sometimes medically actionable assessment of the potential pathogenicity of individual variations and (2) establishing the mechanistic linkage between pathogenic variations and their physiological consequences, providing a rational basis for treatment or preventive care. In this review, we provide an overview of these opportunities and their associated challenges in light of the current status of genomic science and personalized medicine, the latter often termed precision medicine. PMID:25856502
Does the central dogma still stand?

PubMed

Koonin, Eugene V

2012-08-23

Prions are agents of analog, protein conformation-based inheritance that can confer beneficial phenotypes to cells, especially under stress. Combined with genetic variation, prion-mediated inheritance can be channeled into prion-independent genomic inheritance. Latest screening shows that prions are common, at least in fungi. Thus, there is non-negligible flow of information from proteins to the genome in modern cells, in a direct violation of the Central Dogma of molecular biology. The prion-mediated heredity that violates the Central Dogma appears to be a specific, most radical manifestation of the widespread assimilation of protein (epigenetic) variation into genetic variation. The epigenetic variation precedes and facilitates genetic adaptation through a general 'look-ahead effect' of phenotypic mutations. This direction of the information flow is likely to be one of the important routes of environment-genome interaction and could substantially contribute to the evolution of complex adaptive traits.
Host Genetic Control of the Microbiome in Humans and Maize or Relating Host Genetic Variation to the Microbiome (2011 JGI User Meeting)

ScienceCinema

Ley, Ruth E. [Cornell Univ., Ithaca, NY (United States). Cornell Center for Comparative and Population Genomics, Dept. of Microbiology and Dept. of Molecular Biology and Genetics

2018-06-27

The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy and Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Ruth Ley of Cornell University gives a presentation on "Relating Host Genetic Variation to the Microbiome" at the 6th annual Genomics of Energy and Environment Meeting on March 23, 2011.
Variation resources at UC Santa Cruz.

PubMed

Thomas, Daryl J; Trumbower, Heather; Kern, Andrew D; Rhead, Brooke L; Kuhn, Robert M; Haussler, David; Kent, W James

2007-01-01

The variation resources within the University of California Santa Cruz Genome Browser include polymorphism data drawn from public collections and analyses of these data, along with their display in the context of other genomic annotations. Primary data from dbSNP is included for many organisms, with added information including genomic alleles and orthologous alleles for closely related organisms. Display filtering and coloring is available by variant type, functional class or other annotations. Annotation of potential errors is highlighted and a genomic alignment of the variant's flanking sequence is displayed. HapMap allele frequencies and linkage disequilibrium (LD) are available for each HapMap population, along with non-human primate alleles. The browsing and analysis tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.
Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gordon, Sean P.; Contreras-Moreira, Bruno; Woods, Daniel P.

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely tomore » be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.« less
Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

PubMed

Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

2017-12-19

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.
Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure

DOE PAGES

Gordon, Sean P.; Contreras-Moreira, Bruno; Woods, Daniel P.; ...

2017-12-19

While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely tomore » be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.« less
CircosVCF: circos visualization of whole-genome sequence variations stored in VCF files.

PubMed

Drori, E; Levy, D; Smirin-Yosef, P; Rahimi, O; Salmon-Divon, M

2017-05-01

Visualization of whole-genomic variations in a meaningful manner assists researchers in gaining new insights into the underlying data, especially when it comes in the context of whole genome comparisons. CircosVCF is a web based visualization tool for genome-wide variant data described in VCF files, using circos plots. The user friendly interface of CircosVCF supports an interactive design of the circles in the plot, and the integration of additional information such as experimental data or annotations. The provided visualization capabilities give a broad overview of the genomic relationships between genomes, and allow identification of specific meaningful SNPs regions. CircosVCF was implemented in JavaScript and is available at http://www.ariel.ac.il/research/fbl/software. malisa@ariel.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

PubMed

West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

2014-07-01

The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multilocus rDNA systems. [concerted evolution; genome hydridisation; phylogenetic analysis; ribosomal DNA; whole genome sequencing; yeast]. © The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Genome-wide variation in recombination rate in Eucalyptus.

PubMed

Gion, Jean-Marc; Hudson, Corey J; Lesur, Isabelle; Vaillancourt, René E; Potts, Brad M; Freeman, Jules S

2016-08-09

Meiotic recombination is a fundamental evolutionary process. It not only generates diversity, but influences the efficacy of natural selection and genome evolution. There can be significant heterogeneity in recombination rates within and between species, however this variation is not well understood outside of a few model taxa, particularly in forest trees. Eucalypts are forest trees of global economic importance, and dominate many Australian ecosystems. We studied recombination rate in Eucalyptus globulus using genetic linkage maps constructed in 10 unrelated individuals, and markers anchored to the Eucalyptus reference genome. This experimental design provided the replication to study whether recombination rate varied between individuals and chromosomes, and allowed us to study the genomic attributes and population genetic parameters correlated with this variation. Recombination rate varied significantly between individuals (range = 2.71 to 3.51 centimorgans/megabase [cM/Mb]), but was not significantly influenced by sex or cross type (F1 vs. F2). Significant differences in recombination rate between chromosomes were also evident (range = 1.98 to 3.81 cM/Mb), beyond those which were due to variation in chromosome size. Variation in chromosomal recombination rate was significantly correlated with gene density (r = 0.94), GC content (r = 0.90), and the number of tandem duplicated genes (r = -0.72) per chromosome. Notably, chromosome level recombination rate was also negatively correlated with the average genetic diversity across six species from an independent set of samples (r = -0.75). The correlations with genomic attributes are consistent with findings in other taxa, however, the direction of the correlation between diversity and recombination rate is opposite to that commonly observed. We argue this is likely to reflect the interaction of selection and specific genome architecture of Eucalyptus. Interestingly, the differences amongst chromosomes in recombination rates appear stable across Eucalyptus species. Together with the strong correlations between recombination rate and features of the Eucalyptus reference genome, we maintain these findings provide further evidence for a broad conservation of genome architecture across the globally significant lineages of Eucalyptus.
Development and evaluation of a high density genotyping 'Axiom_Arachis' array with 58K SNPs for accelerating genetics and breeding in groundnut

USDA-ARS?s Scientific Manuscript database

Single nucleotide polymorphisms (SNPs) are the most abundant DNA sequence variation in the genomes which can be used to associate genotypic variation to the phenotype. Therefore, availability of a high-density SNP array with uniform genome coverage can advance genetic studies and breeding applicatio...
Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes.

PubMed

Przytycki, Pawel F; Singh, Mona

2017-08-25

A major aim of cancer genomics is to pinpoint which somatically mutated genes are involved in tumor initiation and progression. We introduce a new framework for uncovering cancer genes, differential mutation analysis, which compares the mutational profiles of genes across cancer genomes with their natural germline variation across healthy individuals. We present DiffMut, a fast and simple approach for differential mutational analysis, and demonstrate that it is more effective in discovering cancer genes than considerably more sophisticated approaches. We conclude that germline variation across healthy human genomes provides a powerful means for characterizing somatic mutation frequency and identifying cancer driver genes. DiffMut is available at https://github.com/Singh-Lab/Differential-Mutation-Analysis .
A high-density genetic map reveals variation in recombination rate across the genome of Daphnia magna.

PubMed

Dukić, Marinela; Berner, Daniel; Roesti, Marius; Haag, Christoph R; Ebert, Dieter

2016-10-13

Recombination rate is an essential parameter for many genetic analyses. Recombination rates are highly variable across species, populations, individuals and different genomic regions. Due to the profound influence that recombination can have on intraspecific diversity and interspecific divergence, characterization of recombination rate variation emerges as a key resource for population genomic studies and emphasises the importance of high-density genetic maps as tools for studying genome biology. Here we present such a high-density genetic map for Daphnia magna, and analyse patterns of recombination rate across the genome. A F2 intercross panel was genotyped by Restriction-site Associated DNA sequencing to construct the third-generation linkage map of D. magna. The resulting high-density map included 4037 markers covering 813 scaffolds and contigs that sum up to 77 % of the currently available genome draft sequence (v2.4) and 55 % of the estimated genome size (238 Mb). Total genetic length of the map presented here is 1614.5 cM and the genome-wide recombination rate is estimated to 6.78 cM/Mb. Merging genetic and physical information we consistently found that recombination rate estimates are high towards the peripheral parts of the chromosomes, while chromosome centres, harbouring centromeres in D. magna, show very low recombination rate estimates. Due to its high-density, the third-generation linkage map for D. magna can be coupled with the draft genome assembly, providing an essential tool for genome investigation in this model organism. Thus, our linkage map can be used for the on-going improvements of the genome assembly, but more importantly, it has enabled us to characterize variation in recombination rate across the genome of D. magna for the first time. These new insights can provide a valuable assistance in future studies of the genome evolution, mapping of quantitative traits and population genetic studies.
[Phylogenetic relationships and intraspecific variation of D-genome Aegilops L. as revealed by RAPD analysis].

PubMed

Goriunova, S V; Kochieva, E Z; Chikida, N N; Pukhal'skiĭ, V A

2004-05-01

RAPD analysis was carried out to study the genetic variation and phylogenetic relationships of polyploid Aegilops species, which contain the D genome as a component of the alloploid genome, and diploid Aegilops tauschii, which is a putative donor of the D genome for common wheat. In total, 74 accessions of six D-genome Aegilops species were examined. The highest intraspecific variation (0.03-0.21) was observed for Ae. tauschii. Intraspecific distances between accessions ranged 0.007-0.067 in Ae. cylindrica, 0.017-0.047 in Ae. vavilovii, and 0.00-0.053 in Ae. juvenalis. Likewise, Ae. ventricosa and Ae. crassa showed low intraspecific polymorphism. The among-accession difference in alloploid Ae. ventricosa (genome DvNv) was similar to that of one parental species, Ae. uniaristata (N), and substantially lower than in the other parent, Ae. tauschii (D). The among-accession difference in Ae. cylindrica (CcDc) was considerably lower than in either parent, Ae. tauschii (D) or Ae. caudata (C). With the exception of Ae. cylindrica, all D-genome species--Ae. tauschii (D), Ae. ventricosa (DvNv), Ae. crassa (XcrDcrl and XcrDcrlDcr2), Ae. juvenalis (XjDjUj), and Ae. vavilovii (XvaDvaSva)--formed a single polymorphic cluster, which was distinct from clusters of other species. The only exception, Ae. cylindrica, did not group with the other D-genome species, but clustered with Ae. caudata (C), a donor of the C genome. The cluster of these two species was clearly distinct from the cluster of the other D-genome species and close to a cluster of Ae. umbellulata (genome U) and Ae. ovata (genome UgMg). Thus, RAPD analysis for the first time was used to estimate and to compare the interpopulation polymorphism and to establish the phylogenetic relationships of all diploid and alloploid D-genome Aegilops species.
A Novel Recombinant Enterovirus Type EV-A89 with Low Epidemic Strength in Xinjiang, China

PubMed Central

Fan, Qin; Zhang, Yong; Hu, Lan; Sun, Qiang; Cui, Hui; Yan, Dongmei; Sikandaner, Huerxidan; Tang, Haishu; Wang, Dongyan; Zhu, Zhen; Zhu, Shuangli; Xu, Wenbo

2015-01-01

Enterovirus A89 (EV-A89) is a novel member of the EV-A species. To date, only one full-length genome sequence (the prototype strain) has been published. Here, we report the molecular identification and genomic characterization of a Chinese EV-A89 strain, KSYPH-TRMH22F/XJ/CHN/2011, isolated in 2011 from a contact of an acute flaccid paralysis (AFP) patient during AFP case surveillance in Xinjiang China. This was the first report of EV-A89 in China. The VP1 coding sequence of this strain demonstrated 93.2% nucleotide and 99.3% amino acid identity with the EV-A89 prototype strain. In the P2 and P3 regions, the Chinese EV-A89 strain demonstrated markedly higher identity than the prototype strains of EV-A76, EV-A90, and EV-A91, indicating that one or more recombination events between EV-A89 and these EV-A types might have occurred. Long-term evolution of these EV types originated from the same ancestor provides the spatial and temporal circumstances for recombination to occur. An antibody sero-prevalence survey against EV-A89 in two Xinjiang prefectures demonstrated low positive rates and low titres of EV-A89 neutralization antibody, suggesting limited range of transmission and exposure to the population. This study provides a solid foundation for further studies on the biological and pathogenic properties of EV-A89. PMID:26685900
A Novel Recombinant Enterovirus Type EV-A89 with Low Epidemic Strength in Xinjiang, China.

PubMed

Fan, Qin; Zhang, Yong; Hu, Lan; Sun, Qiang; Cui, Hui; Yan, Dongmei; Sikandaner, Huerxidan; Tang, Haishu; Wang, Dongyan; Zhu, Zhen; Zhu, Shuangli; Xu, Wenbo

2015-12-21

Enterovirus A89 (EV-A89) is a novel member of the EV-A species. To date, only one full-length genome sequence (the prototype strain) has been published. Here, we report the molecular identification and genomic characterization of a Chinese EV-A89 strain, KSYPH-TRMH22F/XJ/CHN/2011, isolated in 2011 from a contact of an acute flaccid paralysis (AFP) patient during AFP case surveillance in Xinjiang China. This was the first report of EV-A89 in China. The VP1 coding sequence of this strain demonstrated 93.2% nucleotide and 99.3% amino acid identity with the EV-A89 prototype strain. In the P2 and P3 regions, the Chinese EV-A89 strain demonstrated markedly higher identity than the prototype strains of EV-A76, EV-A90, and EV-A91, indicating that one or more recombination events between EV-A89 and these EV-A types might have occurred. Long-term evolution of these EV types originated from the same ancestor provides the spatial and temporal circumstances for recombination to occur. An antibody sero-prevalence survey against EV-A89 in two Xinjiang prefectures demonstrated low positive rates and low titres of EV-A89 neutralization antibody, suggesting limited range of transmission and exposure to the population. This study provides a solid foundation for further studies on the biological and pathogenic properties of EV-A89.
Regulatory variation: an emerging vantage point for cancer biology.

PubMed

Li, Luolan; Lorzadeh, Alireza; Hirst, Martin

2014-01-01

Transcriptional regulation involves complex and interdependent interactions of noncoding and coding regions of the genome with proteins that interact and modify them. Genetic variation/mutation in coding and noncoding regions of the genome can drive aberrant transcription and disease. In spite of accounting for nearly 98% of the genome comparatively little is known about the contribution of noncoding DNA elements to disease. Genome-wide association studies of complex human diseases including cancer have revealed enrichment for variants in the noncoding genome. A striking finding of recent cancer genome re-sequencing efforts has been the previously underappreciated frequency of mutations in epigenetic modifiers across a wide range of cancer types. Taken together these results point to the importance of dysregulation in transcriptional regulatory control in genesis of cancer. Powered by recent technological advancements in functional genomic profiling, exploration of normal and transformed regulatory networks will provide novel insight into the initiation and progression of cancer and open new windows to future prognostic and diagnostic tools. © 2013 Wiley Periodicals, Inc.
A comparative genomic hybridization approach to study gene copy number variations among Chinese hamster cell lines.

PubMed

Vishwanathan, Nandita; Bandyopadhyay, Arpan; Fu, Hsu-Yuan; Johnson, Kathryn C; Springer, Nathan M; Hu, Wei-Shou

2017-08-01

Chinese Hamster Ovary (CHO) cells are aneuploid in nature. The genome of recombinant protein producing CHO cell lines continuously undergoes changes in its structure and organization. We analyzed nine cell lines, including parental cell lines, using a comparative genomic hybridization (CGH) array focused on gene-containing regions. The comparison of CGH with copy-number estimates from sequencing data showed good correlation. Hierarchical clustering of the gene copy number variation data from CGH data revealed the lineage relationships between the cell lines. On analyzing the clones of a clonal population, some regions with altered genomic copy number status were identified indicating genomic changes during passaging. A CGH array is thus an effective tool in quantifying genomic alterations in industrial cell lines and can provide insights into the changes in the genomic structure during cell line derivation and long term culture. Biotechnol. Bioeng. 2017;114: 1903-1908. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Typing and comparative genome analysis of Brucella melitensis isolated from Lebanon.

PubMed

Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima

2017-10-16

Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Association mapping in sunflower (Helianthus annuus L.) reveals independent control of apical vs. basal branching.

PubMed

Nambeesan, Savithri U; Mandel, Jennifer R; Bowers, John E; Marek, Laura F; Ebert, Daniel; Corbi, Jonathan; Rieseberg, Loren H; Knapp, Steven J; Burke, John M

2015-03-11

Shoot branching is an important determinant of plant architecture and influences various aspects of growth and development. Selection on branching has also played an important role in the domestication of crop plants, including sunflower (Helianthus annuus L.). Here, we describe an investigation of the genetic basis of variation in branching in sunflower via association mapping in a diverse collection of cultivated sunflower lines. Detailed phenotypic analyses revealed extensive variation in the extent and type of branching within the focal population. After correcting for population structure and kinship, association analyses were performed using a genome-wide collection of SNPs to identify genomic regions that influence a variety of branching-related traits. This work resulted in the identification of multiple previously unidentified genomic regions that contribute to variation in branching. Genomic regions that were associated with apical and mid-apical branching were generally distinct from those associated with basal and mid-basal branching. Homologs of known branching genes from other study systems (i.e., Arabidopsis, rice, pea, and petunia) were also identified from the draft assembly of the sunflower genome and their map positions were compared to those of associations identified herein. Numerous candidate branching genes were found to map in close proximity to significant branching associations. In sunflower, variation in branching is genetically complex and overall branching patterns (i.e., apical vs. basal) were found to be influenced by distinct genomic regions. Moreover, numerous candidate branching genes mapped in close proximity to significant branching associations. Although the sunflower genome exhibits localized islands of elevated linkage disequilibrium (LD), these non-random associations are known to decay rapidly elsewhere. The subset of candidate genes that co-localized with significant associations in regions of low LD represents the most promising target for future functional analyses.
Mitochondrial genomic variation associated with higher mitochondrial copy number: the Cache County Study on Memory Health and Aging.

PubMed

Ridge, Perry G; Maxwell, Taylor J; Foutz, Spencer J; Bailey, Matthew H; Corcoran, Christopher D; Tschanz, JoAnn T; Norton, Maria C; Munger, Ronald G; O'Brien, Elizabeth; Kerber, Richard A; Cawthon, Richard M; Kauwe, John S K

2014-01-01

The mitochondria are essential organelles and are the location of cellular respiration, which is responsible for the majority of ATP production. Each cell contains multiple mitochondria, and each mitochondrion contains multiple copies of its own circular genome. The ratio of mitochondrial genomes to nuclear genomes is referred to as mitochondrial copy number. Decreases in mitochondrial copy number are known to occur in many tissues as people age, and in certain diseases. The regulation of mitochondrial copy number by nuclear genes has been studied extensively. While mitochondrial variation has been associated with longevity and some of the diseases known to have reduced mitochondrial copy number, the role that the mitochondrial genome itself has in regulating mitochondrial copy number remains poorly understood. We analyzed the complete mitochondrial genomes from 1007 individuals randomly selected from the Cache County Study on Memory Health and Aging utilizing the inferred evolutionary history of the mitochondrial haplotypes present in our dataset to identify sequence variation and mitochondrial haplotypes associated with changes in mitochondrial copy number. Three variants belonging to mitochondrial haplogroups U5A1 and T2 were significantly associated with higher mitochondrial copy number in our dataset. We identified three variants associated with higher mitochondrial copy number and suggest several hypotheses for how these variants influence mitochondrial copy number by interacting with known regulators of mitochondrial copy number. Our results are the first to report sequence variation in the mitochondrial genome that causes changes in mitochondrial copy number. The identification of these variants that increase mtDNA copy number has important implications in understanding the pathological processes that underlie these phenotypes.
Dynamics of genome size evolution in birds and mammals

PubMed Central

Feschotte, Cédric

2017-01-01

Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified “accordion” model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives. PMID:28179571
Are we Genomic Mosaics? Variations of the Genome of Somatic Cells can Contribute to Diversify our Phenotypes.

PubMed

Astolfi, P A; Salamini, F; Sgaramella, V

2010-09-01

Theoretical and experimental evidences support the hypothesis that the genomes and the epigenomes may be different in the somatic cells of complex organisms. In the genome, the differences range from single base substitutions to chromosome number; in the epigenome, they entail multiple postsynthetic modifications of the chromatin. Somatic genome variations (SGV) may accumulate during development in response both to genetic programs, which may differ from tissue to tissue, and to environmental stimuli, which are often undetected and generally irreproducible. SGV may jeopardize physiological cellular functions, but also create novel coding and regulatory sequences, to be exposed to intraorganismal Darwinian selection. Genomes acknowledged as comparatively poor in genes, such as humans', could thus increase their pristine informational endowment. A better understanding of SGV will contribute to basic issues such as the "nature vs nurture" dualism and the inheritance of acquired characters. On the applied side, they may explain the low yield of cloning via somatic cell nuclear transfer, provide clues to some of the problems associated with transdifferentiation, and interfere with individual DNA analysis. SGV may be unique in the different cells types and in the different developmental stages, and thus explain the several hundred gaps persisting in the human genomes "completed" so far. They may compound the variations associated to our epigenomes and make of each of us an "(epi)genomic" mosaic. An ensuing paradigm is the possibility that a single genome (the ephemeral one assembled at fertilization) has the capacity to generate several different brains in response to different environments.
Advances in biotechnology and linking outputs to variation in complex traits: Plant and Animal Genome meeting January 2012.

PubMed

Appels, R; Barrero, R; Bellgard, M

2012-03-01

The Plant and Animal Genome (PAG, held annually) meeting in January 2012 provided insights into the advances in plant, animal, and microbe genome studies particularly as they impact on our understanding of complex biological systems. The diverse areas of biology covered included the advances in technologies, variation in complex traits, genome change in evolution, and targeting phenotypic changes, across the broad spectrum of life forms. This overview aims to summarize the major advances in research areas presented in the plenary lectures and does not attempt to summarize the diverse research activities covered throughout the PAG in workshops, posters, presentations, and displays by suppliers of cutting-edge technologies.
Genomics, transcriptomics and proteomics to elucidate the pathogenesis of rheumatoid arthritis.

PubMed

Song, Xinqiang; Lin, Qingsong

2017-08-01

Rheumatoid arthritis is an autoimmune disease that affects several organs and tissues, predominantly the synovial joints. The pathogenesis of this disease is not completely understood, which maybe involved in the genomic variations, gene expression, protein translation and post-translational modifications. These system variations in genomics, transcriptomics and proteomics are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. However, various '-omics' techniques developed in recent years have opened up new possibilities for clarifying disease pathways and thereby facilitating early diagnosis and specific therapies. This review examines how recent advances in the fields of genomics, transcriptomics and proteomics have contributed to our understanding of rheumatoid arthritis.
Genome Size Variation in the Genus Carthamus (Asteraceae, Cardueae): Systematic Implications and Additive Changes During Allopolyploidization

PubMed Central

GARNATJE, TERESA; GARCIA, SÒNIA; VILATERSANA, ROSER; VALLÈS, JOAN

2006-01-01

• Background and Aims Plant genome size is an important biological characteristic, with relationships to systematics, ecology and distribution. Currently, there is no information regarding nuclear DNA content for any Carthamus species. In addition to improving the knowledge base, this research focuses on interspecific variation and its implications for the infrageneric classification of this genus. Genome size variation in the process of allopolyploid formation is also addressed. • Methods Nuclear DNA samples from 34 populations of 16 species of the genus Carthamus were assessed by flow cytometry using propidium iodide. • Key Results The 2C values ranged from 2·26 pg for C. leucocaulos to 7·46 pg for C. turkestanicus, and monoploid genome size (1Cx-value) ranged from 1·13 pg in C. leucocaulos to 1·53 pg in C. alexandrinus. Mean genome sizes differed significantly, based on sectional classification. Both allopolyploid species (C. creticus and C. turkestanicus) exhibited nuclear DNA contents in accordance with the sum of the putative parental C-values (in one case with a slight reduction, frequent in polyploids), supporting their hybrid origin. • Conclusions Genome size represents a useful tool in elucidating systematic relationships between closely related species. A considerable reduction in monoploid genome size, possibly due to the hybrid formation, is also reported within these taxa. PMID:16390843
Genome-wide association studies in Africans and African Americans: Expanding the Framework of the Genomics of Human Traits and Disease

PubMed Central

Peprah, Emmanuel; Xu, Huichun; Tekola-Ayele, Fasil; Royal, Charmaine D.

2014-01-01

Genomic research is one of the tools for elucidating the pathogenesis of diseases of global health relevance, and paving the research dimension to clinical and public health translation. Recent advances in genomic research and technologies have increased our understanding of human diseases, genes associated with these disorders, and the relevant mechanisms. Genome-wide association studies (GWAS) have proliferated since the first studies were published several years ago, and have become an important tool in helping researchers comprehend human variation and the role genetic variants play in disease. However, the need to expand the diversity of populations in GWAS has become increasingly apparent as new knowledge is gained about genetic variation. Inclusion of diverse populations in genomic studies is critical to a more complete understanding of human variation and elucidation of the underpinnings of complex diseases. In this review, we summarize the available data on GWAS in recent-African ancestry populations within the western hemisphere (i.e. African Americans and peoples of the Caribbean) and continental African populations. Furthermore, we highlight ways in which genomic studies in populations of recent African ancestry have led to advances in the areas of malaria, HIV, prostate cancer, and other diseases. Finally, we discuss the advantages of conducting GWAS in recent African ancestry populations in the context of addressing existing and emerging global health conditions. PMID:25427668
Single-molecule optical genome mapping of a human HapMap and a colorectal cancer cell line.

PubMed

Teo, Audrey S M; Verzotto, Davide; Yao, Fei; Nagarajan, Niranjan; Hillmer, Axel M

2015-01-01

Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software. Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.

Genomic suppression subtractive hybridization as a tool to identify differences in mycorrhizal fungal genomes.

PubMed

Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola

2011-05-01

Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

PubMed Central

Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

2006-01-01

Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150
Effect of temperature and relative humidity during transportation on green coffee bean moisture content and ochratoxin A production.

PubMed

Palacios-Cabrera, Hector A; Menezes, Hilary C; Iamanaka, Beatriz T; Canepa, Frederico; Teixeira, Aldir A; Carvalhaes, Nelson; Santi, Domenico; Leme, Plinio T Z; Yotsuyanagi, Katumi; Taniwaki, Marta H

2007-01-01

Changes in temperature, relative humidity, and moisture content of green coffee beans were monitored during transportation of coffee from Brazil to Italy. Six containers (three conventional and three prototype) were stowed in three different places (hold, first floor, and deck) on the ship. Each prototype was located next to a conventional container. The moisture content of the coffee in the container located on the first floor was less affected by environmental variations (0.7%) than that in the hold and on the deck. Coffee located in the hold showed the highest variation in moisture content (3%); in addition, the container showed visible condensation. Coffee transported on the deck showed an intermediary variation in moisture (2%), and there was no visible condensation. The variation in coffee moisture content of the prototype containers was similar to that of the conventional ones, especially in the top layers of coffee bags (2 to 3%), while the increase in water activity was 0.70. This suggests that diffusion of moisture occurs very slowly inside the cargo and that there are thus sufficient time and conditions for fungal growth. The regions of the container near the wall and ceiling are susceptible to condensation since they are close to the headspace with its high relative humidity. Ochratoxin A production occurred in coffee located at the top of the container on the deck and in the wet bags from the hold (those found to be wet on opening the containers at the final destination).
Effects of assortative mate choice on the genomic and morphological structure of a hybrid zone between two bird subspecies.

PubMed

Semenov, Georgy A; Scordato, Elizabeth S C; Khaydarov, David R; Smith, Chris C R; Kane, Nolan C; Safran, Rebecca J

2017-11-01

Phenotypic differentiation plays an important role in the formation and maintenance of reproductive barriers. In some cases, variation in a few key aspects of phenotype can promote and maintain divergence; hence, the identification of these traits and their associations with patterns of genomic divergence is crucial for understanding the patterns and processes of population differentiation. We studied hybridization between the alba and personata subspecies of the white wagtail (Motacilla alba), and quantified divergence and introgression of multiple morphological traits and 19,437 SNP loci on a 3,000 km transect. Our goal was to identify traits that may contribute to reproductive barriers and to assess how variation in these traits corresponds to patterns of genome-wide divergence. Variation in only one trait-head plumage patterning-was consistent with reproductive isolation. Transitions in head plumage were steep and occurred over otherwise morphologically and genetically homogeneous populations, whereas cline centres for other traits and genomic ancestry were displaced over 100 km from the head cline. Field observational data show that social pairs mated assortatively by head plumage, suggesting that these phenotypes are maintained by divergent mating preferences. In contrast, variation in all other traits and genetic markers could be explained by neutral diffusion, although weak ecological selection cannot be ruled out. Our results emphasize that assortative mating may maintain phenotypic differences independent of other processes shaping genome-wide variation, consistent with other recent findings that raise questions about the relative importance of mate choice, ecological selection and selectively neutral processes for divergent evolution. © 2017 John Wiley & Sons Ltd.
Distribution, functional impact, and origin mechanisms of copy number variation in the barley genome

PubMed Central

2013-01-01

Background There is growing evidence for the prevalence of copy number variation (CNV) and its role in phenotypic variation in many eukaryotic species. Here we use array comparative genomic hybridization to explore the extent of this type of structural variation in domesticated barley cultivars and wild barleys. Results A collection of 14 barley genotypes including eight cultivars and six wild barleys were used for comparative genomic hybridization. CNV affects 14.9% of all the sequences that were assessed. Higher levels of CNV diversity are present in the wild accessions relative to cultivated barley. CNVs are enriched near the ends of all chromosomes except 4H, which exhibits the lowest frequency of CNVs. CNV affects 9.5% of the coding sequences represented on the array and the genes affected by CNV are enriched for sequences annotated as disease-resistance proteins and protein kinases. Sequence-based comparisons of CNV between cultivars Barke and Morex provided evidence that DNA repair mechanisms of double-strand breaks via single-stranded annealing and synthesis-dependent strand annealing play an important role in the origin of CNV in barley. Conclusions We present the first catalog of CNVs in a diploid Triticeae species, which opens the door for future genome diversity research in a tribe that comprises the economically important cereal species wheat, barley, and rye. Our findings constitute a valuable resource for the identification of CNV affecting genes of agronomic importance. We also identify potential mechanisms that can generate variation in copy number in plant genomes. PMID:23758725
Read clouds uncover variation in complex regions of the human genome

PubMed Central

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-01-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
Production and genetic analysis of resynthesized Brassica napus from a B. rapa landrace from the Qinghai-Tibet Plateau and B. alboglabra.

PubMed

Liu, H D; Zhao, Z G; Du, D Z; Deng, C R; Fu, G

2016-01-08

This study aimed to reveal the genetic and epigenetic variations involved in a resynthesized Brassica napus (AACC) generated from a hybridization between a B. rapa (AA) landrace and B. alboglabra (CC). Amplified fragment length polymorphism (AFLP), methylation-sensitive amplified polymorphism, and the cDNA-AFLP technique were performed to detect changes between different generations at the genome, methylation, and transcription levels. We obtained 30 lines of resynthesized B. napus with a mean 1000-seed weight of over 7.50 g. All of the lines were self-compatible, probably because both parents were self-compatible. At the genome level, the S0 generation had the lowest frequency of variations (0.18%) and the S3 generation had the highest (6.07%). The main variation pattern was the elimination of amplified restriction fragments on the CC genome from the S0 to the S4 generations. At the methylation level, we found three loci that exhibited altered methylation patterns on the parental A genome; the variance rate was 1.35%. At the transcription level, we detected 43.77% reverse mutations and 37.56% deletion mutations that mainly occurred on the A and C genomes, respectively, in the S3 generation. Our results highlight the genetic variations that occur during the diploidization of resynthesized B. napus.
High intraspecific genome diversity in the model arbuscular mycorrhizal symbiont Rhizophagus irregularis.

PubMed

Chen, Eric C H; Morin, Emmanuelle; Beaudet, Denis; Noel, Jessica; Yildirir, Gokalp; Ndikumana, Steve; Charron, Philippe; St-Onge, Camille; Giorgi, John; Krüger, Manuela; Marton, Timea; Ropars, Jeanne; Grigoriev, Igor V; Hainaut, Matthieu; Henrissat, Bernard; Roux, Christophe; Martin, Francis; Corradi, Nicolas

2018-01-22

Arbuscular mycorrhizal fungi (AMF) are known to improve plant fitness through the establishment of mycorrhizal symbioses. Genetic and phenotypic variations among closely related AMF isolates can significantly affect plant growth, but the genomic changes underlying this variability are unclear. To address this issue, we improved the genome assembly and gene annotation of the model strain Rhizophagus irregularis DAOM197198, and compared its gene content with five isolates of R. irregularis sampled in the same field. All isolates harbor striking genome variations, with large numbers of isolate-specific genes, gene family expansions, and evidence of interisolate genetic exchange. The observed variability affects all gene ontology terms and PFAM protein domains, as well as putative mycorrhiza-induced small secreted effector-like proteins and other symbiosis differentially expressed genes. High variability is also found in active transposable elements. Overall, these findings indicate a substantial divergence in the functioning capacity of isolates harvested from the same field, and thus their genetic potential for adaptation to biotic and abiotic changes. Our data also provide a first glimpse into the genome diversity that resides within natural populations of these symbionts, and open avenues for future analyses of plant-AMF interactions that link AMF genome variation with plant phenotype and fitness. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.
Genetic and epigenetic alterations induced by different levels of rye genome integration in wheat recipient.

PubMed

Zheng, X L; Zhou, J P; Zang, L L; Tang, A T; Liu, D Q; Deng, K J; Zhang, Y

2016-06-17

The narrow genetic variation present in common wheat (Triticum aestivum) varieties has greatly restricted the improvement of crop yield in modern breeding systems. Alien addition lines have proven to be an effective means to broaden the genetic diversity of common wheat. Wheat-rye addition lines, which are the direct bridge materials for wheat improvement, have been wildly used to produce new wheat cultivars carrying alien rye germplasm. In this study, we investigated the genetic and epigenetic alterations in two sets of wheat-rye disomic addition lines (1R-7R) and the corresponding triticales. We used expressed sequence tag-simple sequence repeat, amplified fragment length polymorphism, and methylation-sensitive amplification polymorphism analyses to analyze the effects of the introduction of alien chromosomes (either the entire genome or sub-genome) to wheat genetic background. We found obvious and diversiform variations in the genomic primary structure, as well as alterations in the extent and pattern of the genomic DNA methylation of the recipient. Meanwhile, these results also showed that introduction of different rye chromosomes could induce different genetic and epigenetic alterations in its recipient, and the genetic background of the parents is an important factor for genomic and epigenetic variation induced by alien chromosome addition.
Ensemble Analysis of Variational Assimilation of Hydrologic and Hydrometeorological Data into Distributed Hydrologic Model

NASA Astrophysics Data System (ADS)

Lee, H.; Seo, D.; Koren, V.

2008-12-01

A prototype 4DVAR (four-dimensional variational) data assimilator for gridded Sacramento soil-moisture accounting and kinematic-wave routing models in the Hydrology Laboratory's Research Distributed Hydrologic Model (HL-RDHM) has been developed. The prototype assimilates streamflow and in-situ soil moisture data and adjusts gridded precipitation and climatological potential evaporation data to reduce uncertainty in the model initial conditions for improved monitoring and prediction of streamflow and soil moisture at the outlet and interior locations within the catchment. Due to large degrees of freedom involved, data assimilation (DA) into distributed hydrologic models is complex. To understand and assess sensitivity of the performance of DA to uncertainties in the model initial conditions and in the data, two synthetic experiments have been carried out in an ensemble framework. Results from the synthetic experiments shed much light on the potential and limitations with DA into distributed models. For initial real-world assessment, the prototype DA has also been applied to the headwater basin at Eldon near the Oklahoma-Arkansas border. We present these results and describe the next steps.
Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

PubMed Central

Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

2018-01-01

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397
Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

PubMed

Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

2014-12-01

Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
Genome-wide copy number variant analysis in Holstein cattle reveals variants associated with 10 production traits including residual feed intake and dry matter intake

USDA-ARS?s Scientific Manuscript database

Copy number variation (CNV) is an important type of genetic variation contributing to phenotypic differences among mammals and may serve as an alternative molecular marker to single nucleotide polymorphism (SNP) for genome-wide association study (GWAS). Recently, GWAS analysis using CNV has been app...
Digital Charge Coupled Device (CCD) Camera System Architecture

NASA Astrophysics Data System (ADS)

Babey, S. K.; Anger, C. D.; Green, B. D.

1987-03-01

We propose a modeling system for generic objects in order to recognize different objects from the same category with only one generic model. The representation consists of a prototype, represented by parts and their configuration. Parts are modeled by superquadric volumetric primitives which are combined via Boolean operations to form objects. Variations between objects within a category are described by allowable changes in structure and shape deformations of prototypical parts. Each prototypical part and relation has a set of associated features that can be recognized in the images. These features are used for selecting models from the model data base. The selected hypothetical models are then verified on the geometric level by deforming the prototype in allowable ways to match the data. We base our design of the modeling system upon the current psychological theories of categorization and of human visual perception.
RSAT 2018: regulatory sequence analysis tools 20th anniversary.

PubMed

Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

2018-05-02

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Genomic variation in parthenogenetic lizard Darevskia armeniaca: evidence from DNA fingerprinting data.

PubMed

Malysheva, D N; Tokarskaya, Olga N; Petrosyan, Varos G; Danielyan, Felix D; Darevsky, Iliya S; Ryskov, Alexei P

2007-01-01

Microsatellites, or short tandem repeats, are abundant across genomes of most organisms. It is evident that the most straightforward and conclusive way of studying mutations in microsatellite-containing loci is to use clonally transmitted genomes or DNA sequences inherited in multigeneration pedigrees. At present, little is known about the origin of genetic variation in species that lack effective genetic recombination. DNA fingerprinting in 43 families of the parthenogenetic lizard species Darevskia armeniaca (131 siblings), using (GACA)(4), (GGCA)(4), (GATA)(4), and (CAC)(5) probes, revealed mutant fingerprints in siblings that differed from their mothers in several restriction DNA fragments. In some cases, the mutant fingerprints detected in siblings were also found in population samples. The mutation rate for new restriction fragment length estimated by using multilocus probes varied from 0.8 x 10(-2) to 4.9 x 10(-2) per band/per sibling. Probably, the most variations detected as restriction fragment length polymorphism have germ-line origin, but somatic changes of (CAC)(n) fingerprints in adult lizards were also observed. These results provide new evidence of existing unstable regions in genomes of parthenogenetic vertebrate animals, which provide genetic variation in unisexual populations.
SMART on FHIR Genomics: facilitating standardized clinico-genomic apps.

PubMed

Alterovitz, Gil; Warner, Jeremy; Zhang, Peijin; Chen, Yishen; Ullman-Cullere, Mollie; Kreda, David; Kohane, Isaac S

2015-11-01

Supporting clinical decision support for personalized medicine will require linking genome and phenome variants to a patient's electronic health record (EHR), at times on a vast scale. Clinico-genomic data standards will be needed to unify how genomic variant data are accessed from different sequencing systems. A specification for the basis of a clinic-genomic standard, building upon the current Health Level Seven International Fast Healthcare Interoperability Resources (FHIR®) standard, was developed. An FHIR application protocol interface (API) layer was attached to proprietary sequencing platforms and EHRs in order to expose gene variant data for presentation to the end-user. Three representative apps based on the SMART platform were built to test end-to-end feasibility, including integration of genomic and clinical data. Successful design, deployment, and use of the API was demonstrated and adopted by HL7 Clinical Genomics Workgroup. Feasibility was shown through development of three apps by various types of users with background levels and locations. This prototyping work suggests that an entirely data (and web) standards-based approach could prove both effective and efficient for advancing personalized medicine. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

PubMed

Keel, B N; Nonneman, D J; Rohrer, G A

2017-08-01

Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Host Genetic Control of the Microbiome in Humans and Maise or Relating Host Genetic Variation to the Microbiome (2011 JGI User Meeting)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ley, Ruth

2011-03-23

The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy & Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Ruth Ley of Cornell University gives a presentation on "Relating Host Genetic Variation to themore » Microbiome" at the 6th annual Genomics of Energy & Environment Meeting on March 23, 2011.« less
Contextual Variation in Automatic Evaluative Bias to Racially-Ambiguous Faces

PubMed Central

Ito, Tiffany A.; Willadsen-Jensen, Eve C.; Kaye, Jesse T.; Park, Bernadette

2011-01-01

Three studies examined the implicit evaluative associations activated by racially-ambiguous Black-White faces. In the context of both Black and White faces, Study 1 revealed a graded pattern of bias against racially-ambiguous faces that was weaker than the bias to Black faces but stronger than that to White faces. Study 2 showed that significant bias was present when racially-ambiguous faces appeared in the context of only White faces, but not in the context of only Black faces. Study 3 demonstrated that context produces perceptual contrast effects on racial-prototypicality judgments. Racially-ambiguous faces were perceived as more prototypically Black in a White-only than mixed-race context, and less prototypically Black in a Black-only context. Conversely, they were seen as more prototypically White in a Black-only than mixed context, and less prototypically White in a White-only context. The studies suggest that both race-related featural properties within a face (i.e., racial ambiguity) and external contextual factors affect automatic evaluative associations. PMID:21691437

Natural positive selection and north-south genetic diversity in East Asia.

PubMed

Suo, Chen; Xu, Haiyan; Khor, Chiea-Chuen; Ong, Rick Th; Sim, Xueling; Chen, Jieming; Tay, Wan-Ting; Sim, Kar-Seng; Zeng, Yi-Xin; Zhang, Xuejun; Liu, Jianjun; Tai, E-Shyong; Wong, Tien-Yin; Chia, Kee-Seng; Teo, Yik-Ying

2012-01-01

Recent reports have identified a north-south cline in genetic variation in East and South-East Asia, but these studies have not formally explored the basis of these clinical differences. Understanding the origins of these variations may provide valuable insights in tracking down the functional variants in genomic regions identified by genetic association studies. Here we investigate the genetic basis of these differences with genome-wide data from the HapMap, the Human Genome Diversity Project and the Singapore Genome Variation Project. We implemented four bioinformatic measures to discover genomic regions that are considerably differentiated either between two Han Chinese populations in the north and south of China, or across 22 populations in East and South-East Asia. These measures prioritized genomic stretches with: (i) regional differences in the allelic spectrum for SNPs common to the two Han Chinese populations; (ii) differential evidence of positive selection between the two populations as quantified by integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH); (iii) significant correlation between allele frequencies and geographical latitudes of the 22 populations. We also explored the extent of linkage disequilibrium variations in these regions, which is important in combining genetic association studies from North and South Chinese. Two of the regions that emerged are found in HLA class I and II, suggesting that the HLA imputation panel from the HapMap may not be directly applicable to every Chinese sample. This has important implications to autoimmune studies that plan to impute the classical HLA alleles to fine map the SNP association signals.
Natural positive selection and north–south genetic diversity in East Asia

PubMed Central

Suo, Chen; Xu, Haiyan; Khor, Chiea-Chuen; Ong, Rick TH; Sim, Xueling; Chen, Jieming; Tay, Wan-Ting; Sim, Kar-Seng; Zeng, Yi-Xin; Zhang, Xuejun; Liu, Jianjun; Tai, E-Shyong; Wong, Tien-Yin; Chia, Kee-Seng; Teo, Yik-Ying

2012-01-01

Recent reports have identified a north–south cline in genetic variation in East and South-East Asia, but these studies have not formally explored the basis of these clinical differences. Understanding the origins of these variations may provide valuable insights in tracking down the functional variants in genomic regions identified by genetic association studies. Here we investigate the genetic basis of these differences with genome-wide data from the HapMap, the Human Genome Diversity Project and the Singapore Genome Variation Project. We implemented four bioinformatic measures to discover genomic regions that are considerably differentiated either between two Han Chinese populations in the north and south of China, or across 22 populations in East and South-East Asia. These measures prioritized genomic stretches with: (i) regional differences in the allelic spectrum for SNPs common to the two Han Chinese populations; (ii) differential evidence of positive selection between the two populations as quantified by integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH); (iii) significant correlation between allele frequencies and geographical latitudes of the 22 populations. We also explored the extent of linkage disequilibrium variations in these regions, which is important in combining genetic association studies from North and South Chinese. Two of the regions that emerged are found in HLA class I and II, suggesting that the HLA imputation panel from the HapMap may not be directly applicable to every Chinese sample. This has important implications to autoimmune studies that plan to impute the classical HLA alleles to fine map the SNP association signals. PMID:21792231
Comparative genomics of Enterococcus faecalis from healthy Norwegian infants

PubMed Central

Solheim, Margrete; Aakra, Ågot; Snipen, Lars G; Brede, Dag A; Nes, Ingolf F

2009-01-01

Background Enterococcus faecalis, traditionally considered a harmless commensal of the intestinal tract, is now ranked among the leading causes of nosocomial infections. In an attempt to gain insight into the genetic make-up of commensal E. faecalis, we have studied genomic variation in a collection of community-derived E. faecalis isolated from the feces of Norwegian infants. Results The E. faecalis isolates were first sequence typed by multilocus sequence typing (MLST) and characterized with respect to antibiotic resistance and properties associated with virulence. A subset of the isolates was compared to the vancomycin resistant strain E. faecalis V583 (V583) by whole genome microarray comparison (comparative genomic hybridization (CGH)). Several of the putative enterococcal virulence factors were found to be highly prevalent among the commensal baby isolates. The genomic variation as observed by CGH was less between isolates displaying the same MLST sequence type than between isolates belonging to different evolutionary lineages. Conclusion The variations in gene content observed among the investigated commensal E. faecalis is comparable to the genetic variation previously reported among strains of various origins thought to be representative of the major E. faecalis lineages. Previous MLST analysis of E. faecalis have identified so-called high-risk enterococcal clonal complexes (HiRECC), defined as genetically distinct subpopulations, epidemiologically associated with enterococcal infections. The observed correlation between CGH and MLST presented here, may offer a method for the identification of lineage-specific genes, and may therefore add clues on how to distinguish pathogenic from commensal E. faecalis. In this work, information on the core genome of E. faecalis is also substantially extended. PMID:19393078
Applications of the 1000 Genomes Project resources

PubMed Central

Zheng-Bradley, Xiangqun

2017-01-01

Abstract The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. PMID:27436001
Complex multifractal nature in Mycobacterium tuberculosis genome

PubMed Central

Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.

2017-01-01

The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences. PMID:28440326
Complex multifractal nature in Mycobacterium tuberculosis genome

NASA Astrophysics Data System (ADS)

Mandal, Saurav; Roychowdhury, Tanmoy; Chirom, Keilash; Bhattacharya, Alok; Brojen Singh, R. K.

2017-04-01

The mutifractal and long range correlation (C(r)) properties of strings, such as nucleotide sequence can be a useful parameter for identification of underlying patterns and variations. In this study C(r) and multifractal singularity function f(α) have been used to study variations in the genomes of a pathogenic bacteria Mycobacterium tuberculosis. Genomic sequences of M. tuberculosis isolates displayed significant variations in C(r) and f(α) reflecting inherent differences in sequences among isolates. M. tuberculosis isolates can be categorised into different subgroups based on sensitivity to drugs, these are DS (drug sensitive isolates), MDR (multi-drug resistant isolates) and XDR (extremely drug resistant isolates). C(r) follows significantly different scaling rules in different subgroups of isolates, but all the isolates follow one parameter scaling law. The richness in complexity of each subgroup can be quantified by the measures of multifractal parameters displaying a pattern in which XDR isolates have highest value and lowest for drug sensitive isolates. Therefore C(r) and multifractal functions can be useful parameters for analysis of genomic sequences.
De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits.

PubMed

Li, Ying-hui; Zhou, Guangyu; Ma, Jianxin; Jiang, Wenkai; Jin, Long-guo; Zhang, Zhouhao; Guo, Yong; Zhang, Jinbo; Sui, Yi; Zheng, Liangtao; Zhang, Shan-shan; Zuo, Qiyang; Shi, Xue-hui; Li, Yan-fei; Zhang, Wan-ke; Hu, Yiyao; Kong, Guanyi; Hong, Hui-long; Tan, Bing; Song, Jian; Liu, Zhang-xiong; Wang, Yaoshen; Ruan, Hang; Yeung, Carol K L; Liu, Jian; Wang, Hailong; Zhang, Li-juan; Guan, Rong-xia; Wang, Ke-jing; Li, Wen-bin; Chen, Shou-yi; Chang, Ru-zhen; Jiang, Zhi; Jackson, Scott A; Li, Ruiqiang; Qiu, Li-juan

2014-10-01

Wild relatives of crops are an important source of genetic diversity for agriculture, but their gene repertoire remains largely unexplored. We report the establishment and analysis of a pan-genome of Glycine soja, the wild relative of cultivated soybean Glycine max, by sequencing and de novo assembly of seven phylogenetically and geographically representative accessions. Intergenomic comparisons identified lineage-specific genes and genes with copy number variation or large-effect mutations, some of which show evidence of positive selection and may contribute to variation of agronomic traits such as biotic resistance, seed composition, flowering and maturity time, organ size and final biomass. Approximately 80% of the pan-genome was present in all seven accessions (core), whereas the rest was dispensable and exhibited greater variation than the core genome, perhaps reflecting a role in adaptation to diverse environments. This work will facilitate the harnessing of untapped genetic diversity from wild soybean for enhancement of elite cultivars.
Landscape genomic prediction for restoration of a Eucalyptus foundation species under climate change.

PubMed

Supple, Megan Ann; Bragg, Jason G; Broadhurst, Linda M; Nicotra, Adrienne B; Byrne, Margaret; Andrew, Rose L; Widdup, Abigail; Aitken, Nicola C; Borevitz, Justin O

2018-04-24

As species face rapid environmental change, we can build resilient populations through restoration projects that incorporate predicted future climates into seed sourcing decisions. Eucalyptus melliodora is a foundation species of a critically endangered community in Australia that is a target for restoration. We examined genomic and phenotypic variation to make empirical based recommendations for seed sourcing. We examined isolation by distance and isolation by environment, determining high levels of gene flow extending for 500 km and correlations with climate and soil variables. Growth experiments revealed extensive phenotypic variation both within and among sampling sites, but no site-specific differentiation in phenotypic plasticity. Model predictions suggest that seed can be sourced broadly across the landscape, providing ample diversity for adaptation to environmental change. Application of our landscape genomic model to E. melliodora restoration projects can identify genomic variation suitable for predicted future climates, thereby increasing the long term probability of successful restoration. © 2018, Supple et al.
Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield.

PubMed

Ma, Zhiying; He, Shoupu; Wang, Xingfen; Sun, Junling; Zhang, Yan; Zhang, Guiyin; Wu, Liqiang; Li, Zhikun; Liu, Zhihao; Sun, Gaofei; Yan, Yuanyuan; Jia, Yinhua; Yang, Jun; Pan, Zhaoe; Gu, Qishen; Li, Xueyuan; Sun, Zhengwen; Dai, Panhong; Liu, Zhengwen; Gong, Wenfang; Wu, Jinhua; Wang, Mi; Liu, Hengwei; Feng, Keyun; Ke, Huifeng; Wang, Junduo; Lan, Hongyu; Wang, Guoning; Peng, Jun; Wang, Nan; Wang, Liru; Pang, Baoyin; Peng, Zhen; Li, Ruiqiang; Tian, Shilin; Du, Xiongming

2018-05-07

Upland cotton is the most important natural-fiber crop. The genomic variation of diverse germplasms and alleles underpinning fiber quality and yield should be extensively explored. Here, we resequenced a core collection comprising 419 accessions with 6.55-fold coverage depth and identified approximately 3.66 million SNPs for evaluating the genomic variation. We performed phenotyping across 12 environments and conducted genome-wide association study of 13 fiber-related traits. 7,383 unique SNPs were significantly associated with these traits and were located within or near 4,820 genes; more associated loci were detected for fiber quality than fiber yield, and more fiber genes were detected in the D than the A subgenome. Several previously undescribed causal genes for days to flowering, fiber length, and fiber strength were identified. Phenotypic selection for these traits increased the frequency of elite alleles during domestication and breeding. These results provide targets for molecular selection and genetic manipulation in cotton improvement.
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
WhopGenome: high-speed access to whole-genome variation and sequence data in R.

PubMed

Wittelsbürger, Ulrich; Pfeifer, Bastian; Lercher, Martin J

2015-02-01

The statistical programming language R has become a de facto standard for the analysis of many types of biological data, and is well suited for the rapid development of new algorithms. However, variant call data from population-scale resequencing projects are typically too large to be read and processed efficiently with R's built-in I/O capabilities. WhopGenome can efficiently read whole-genome variation data stored in the widely used variant call format (VCF) file format into several R data types. VCF files can be accessed either on local hard drives or on remote servers. WhopGenome can associate variants with annotations such as those available from the UCSC genome browser, and can accelerate the reading process by filtering loci according to user-defined criteria. WhopGenome can also read other Tabix-indexed files and create indices to allow fast selective access to FASTA-formatted sequence files. The WhopGenome R package is available on CRAN at http://cran.r-project.org/web/packages/WhopGenome/. A Bioconductor package has been submitted. lercher@cs.uni-duesseldorf.de. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Characterization of Genome-Wide Variation in Four-Row Wax, a Waxy Maize Landrace with a Reduced Kernel Row Phenotype

PubMed Central

Liu, Hanmei; Wang, Xuewen; Wei, Bin; Wang, Yongbin; Liu, Yinghong; Zhang, Junjie; Hu, Yufeng; Yu, Guowu; Li, Jian; Xu, Zhanbin; Huang, Yubi

2016-01-01

In southwest China, some maize landraces have long been isolated geographically, and have phenotypes that differ from those of widely grown cultivars. These landraces may harbor rich genetic variation responsible for those phenotypes. Four-row Wax is one such landrace, with four rows of kernels on the cob. We resequenced the genome of Four-row Wax, obtaining 50.46 Gb sequence at 21.87× coverage, then identified and characterized 3,252,194 SNPs, 213,181 short InDels (1–5 bp) and 39,631 structural variations (greater than 5 bp). Of those, 312,511 (9.6%) SNPs were novel compared to the most detailed haplotype map (HapMap) SNP database of maize. Characterization of variations in reported kernel row number (KRN) related genes and KRN QTL regions revealed potential causal mutations in fea2, td1, kn1, and te1. Genome-wide comparisons revealed abundant genetic variations in Four-row Wax, which may be associated with environmental adaptation. The sequence and SNP variations described here enrich genetic resources of maize, and provide guidance into study of seed numbers for crop yield improvement. PMID:27242868
Genome size variation in wild and cultivated maize along altitudinal gradients

PubMed Central

Díez, Concepción M.; Gaut, Brandon S.; Meca, Esteban; Scheinvar, Enrique; Montes-Hernandez, Salvador; Eguiarte, Luis E.; Tenaillon, Maud I.

2014-01-01

Summary • It is still an open question as to whether genome size (GS) variation is shaped by natural selection. One approach to address this question is a population-level survey that assesses both the variation in GS and the relationship of GS to ecological variants. • We assessed GS in Zea mays, a species that includes the cultivated crop, maize, and its closest wild relatives, the teosintes. We measured GS in five plants of each of 22 maize landraces and 21 teosinte populations from Mexico sampled from parallel altitudinal gradients. • GS was significantly smaller in landraces than in teosintes, but the largest component of GS variation was among landraces and among populations. In maize, GS correlated negatively with altitude; more generally, the best GS predictors were linked to geography. By contrast, GS variation in teosintes was best explained by temperature and precipitation. • Overall, our results further document the size flexibility of the Zea genome, but also point to a drastic shift in patterns of GS variation since domestication. We argue that such patterns may reflect the indirect action of selection on GS, through a multiplicity of phenotypes and life-history traits. PMID:23550586
Population Genomics of Infectious and Integrated Wolbachia pipientis Genomes in Drosophila ananassae

PubMed Central

Choi, Jae Young; Bubnell, Jaclyn E.; Aquadro, Charles F.

2015-01-01

Coevolution between Drosophila and its endosymbiont Wolbachia pipientis has many intriguing aspects. For example, Drosophila ananassae hosts two forms of W. pipientis genomes: One being the infectious bacterial genome and the other integrated into the host nuclear genome. Here, we characterize the infectious and integrated genomes of W. pipientis infecting D. ananassae (wAna), by genome sequencing 15 strains of D. ananassae that have either the infectious or integrated wAna genomes. Results indicate evolutionarily stable maternal transmission for the infectious wAna genome suggesting a relatively long-term coevolution with its host. In contrast, the integrated wAna genome showed pseudogene-like characteristics accumulating many variants that are predicted to have deleterious effects if present in an infectious bacterial genome. Phylogenomic analysis of sequence variation together with genotyping by polymerase chain reaction of large structural variations indicated several wAna variants among the eight infectious wAna genomes. In contrast, only a single wAna variant was found among the seven integrated wAna genomes examined in lines from Africa, south Asia, and south Pacific islands suggesting that the integration occurred once from a single infectious wAna genome and then spread geographically. Further analysis revealed that for all D. ananassae we examined with the integrated wAna genomes, the majority of the integrated wAna genomic regions is represented in at least two copies suggesting a double integration or single integration followed by an integrated genome duplication. The possible evolutionary mechanism underlying the widespread geographical presence of the duplicate integration of the wAna genome is an intriguing question remaining to be answered. PMID:26254486
Natural genetic variation of the cardiac transcriptome in non-diseased donors and patients with dilated cardiomyopathy.

PubMed

Heinig, Matthias; Adriaens, Michiel E; Schafer, Sebastian; van Deutekom, Hanneke W M; Lodder, Elisabeth M; Ware, James S; Schneider, Valentin; Felkin, Leanne E; Creemers, Esther E; Meder, Benjamin; Katus, Hugo A; Rühle, Frank; Stoll, Monika; Cambien, François; Villard, Eric; Charron, Philippe; Varro, Andras; Bishopric, Nanette H; George, Alfred L; Dos Remedios, Cristobal; Moreno-Moral, Aida; Pesce, Francesco; Bauerfeind, Anja; Rüschendorf, Franz; Rintisch, Carola; Petretto, Enrico; Barton, Paul J; Cook, Stuart A; Pinto, Yigal M; Bezzina, Connie R; Hubner, Norbert

2017-09-14

Genetic variation is an important determinant of RNA transcription and splicing, which in turn contributes to variation in human traits, including cardiovascular diseases. Here we report the first in-depth survey of heart transcriptome variation using RNA-sequencing in 97 patients with dilated cardiomyopathy and 108 non-diseased controls. We reveal extensive differences of gene expression and splicing between dilated cardiomyopathy patients and controls, affecting known as well as novel dilated cardiomyopathy genes. Moreover, we show a widespread effect of genetic variation on the regulation of transcription, isoform usage, and allele-specific expression. Systematic annotation of genome-wide association SNPs identifies 60 functional candidate genes for heart phenotypes, representing 20% of all published heart genome-wide association loci. Focusing on the dilated cardiomyopathy phenotype we found that eQTL variants are also enriched for dilated cardiomyopathy genome-wide association signals in two independent cohorts. RNA transcription, splicing, and allele-specific expression are each important determinants of the dilated cardiomyopathy phenotype and are controlled by genetic factors. Our results represent a powerful resource for the field of cardiovascular genetics.
The legacy of domestication: accumulation of deleterious mutations in the dog genome.

PubMed

Cruz, Fernando; Vilà, Carles; Webster, Matthew T

2008-11-01

Dogs exhibit more phenotypic variation than any other mammal and are affected by a wide variety of genetic diseases. However, the origin and genetic basis of this variation is still poorly understood. We examined the effect of domestication on the dog genome by comparison with its wild ancestor, the gray wolf. We compared variation in dog and wolf genes using whole-genome single nucleotide polymorphism (SNP) data. The d(N)/d(S) ratio (omega) was around 50% greater for SNPs found in dogs than in wolves, indicating that a higher proportion of nonsynonymous alleles segregate in dogs compared with nonfunctional genetic variation. We suggest that the majority of these alleles are slightly deleterious and that two main factors may have contributed to their increase. The first is a relaxation of selective constraint due to a population bottleneck and altered breeding patterns accompanying domestication. The second is a reduction of effective population size at loci linked to those under positive selection due to Hill-Robertson interference. An increase in slightly deleterious genetic variation could contribute to the prevalence of disease in modern dog breeds.
Thermal energy conversion by coupled shape memory and piezoelectric effects

NASA Astrophysics Data System (ADS)

Zakharov, Dmitry; Lebedev, Gor; Cugat, Orphee; Delamare, Jerome; Viala, Bernard; Lafont, Thomas; Gimeno, Leticia; Shelyakov, Alexander

2012-09-01

This work gives experimental evidence of a promising method of thermal-to-electric energy conversion by coupling shape memory effect (SME) and direct piezoelectric effect (DPE) for harvesting quasi-static ambient temperature variations. Two original prototypes of thermal energy harvesters have been fabricated and tested experimentally. The first is a hybrid laminated composite consisting of TiNiCu shape memory alloy (SMA) and macro fiber composite piezoelectric. This composite comprises 0.1 cm3 of active materials and harvests 75 µJ of energy for each temperature variation of 60 °C. The second prototype is a SME/DPE ‘machine’ which uses the thermally induced linear strains of the SMA to bend a bulk PZT ceramic plate through a specially designed mechanical structure. The SME/DPE ‘machine’ with 0.2 cm3 of active material harvests 90 µJ over a temperature increase of 35 °C (60 µJ when cooling). In contrast to pyroelectric materials, such harvesters are also compatible with both small and slow temperature variations.
Genome-wide Association Studies from the Cancer Genetic Markers of Susceptibility (CGEMS) Initiative | Office of Cancer Genomics

Cancer.gov

CGEMS identifies common inherited genetic variations associated with a number of cancers, including breast and prostate. Data from these genome-wide association studies (GWAS) are available through the Division of Cancer Epidemiology & Genetics website.
Genome-Wide Survey on Genomic Variation, Expression Divergence, and Evolution in Two Contrasting Rice Genotypes under High Salinity Stress

PubMed Central

Jiang, Shu-Ye; Ma, Ali; Ramamoorthy, Rengasamy; Ramachandran, Srinivasan

2013-01-01

Expression profiling is one of the most important tools for dissecting biological functions of genes and the upregulation or downregulation of gene expression is sufficient for recreating phenotypic differences. Expression divergence of genes significantly contributes to phenotypic variations. However, little is known on the molecular basis of expression divergence and evolution among rice genotypes with contrasting phenotypes. In this study, we have implemented an integrative approach using bioinformatics and experimental analyses to provide insights into genomic variation, expression divergence, and evolution between salinity-sensitive rice variety Nipponbare and tolerant rice line Pokkali under normal and high salinity stress conditions. We have detected thousands of differentially expressed genes between these two genotypes and thousands of up- or downregulated genes under high salinity stress. Many genes were first detected with expression evidence using custom microarray analysis. Some gene families were preferentially regulated by high salinity stress and might play key roles in stress-responsive biological processes. Genomic variations in promoter regions resulted from single nucleotide polymorphisms, indels (1–10 bp of insertion/deletion), and structural variations significantly contributed to the expression divergence and regulation. Our data also showed that tandem and segmental duplication, CACTA and hAT elements played roles in the evolution of gene expression divergence and regulation between these two contrasting genotypes under normal or high salinity stress conditions. PMID:24121498
Genomic data reveal a loss of diversity in two species of tuco-tucos (genus Ctenomys) following a volcanic eruption.

PubMed

Hsu, Jeremy L; Crawford, Jeremy Chase; Tammone, Mauro N; Ramakrishnan, Uma; Lacey, Eileen A; Hadly, Elizabeth A

2017-11-24

Marked reductions in population size can trigger corresponding declines in genetic variation. Understanding the precise genetic consequences of such reductions, however, is often challenging due to the absence of robust pre- and post-reduction datasets. Here, we use heterochronous genomic data from samples obtained before and immediately after the 2011 eruption of the Puyehue-Cordón Caulle volcanic complex in Patagonia to explore the genetic impacts of this event on two parapatric species of rodents, the colonial tuco-tuco (Ctenomys sociabilis) and the Patagonian tuco-tuco (C. haigi). Previous analyses using microsatellites revealed no post-eruption changes in genetic variation in C. haigi, but an unexpected increase in variation in C. sociabilis. To explore this outcome further, we used targeted gene capture to sequence over 2,000 putatively neutral regions for both species. Our data revealed that, contrary to the microsatellite analyses, the eruption was associated with a small but significant decrease in genetic variation in both species. We suggest that genome-level analyses provide greater power than traditional molecular markers to detect the genetic consequences of population size changes, particularly changes that are recent, short-term, or modest in size. Consequently, genomic analyses promise to generate important new insights into the effects of specific environmental events on demography and genetic variation.

Race and Ethnicity in the Genome Era: The Complexity of the Constructs

ERIC Educational Resources Information Center

Bonham, Vence L.; Warshauer-Baker, Esther; Collins, Francis S.

2005-01-01

The vast amount of biological information that is now available through the completion of the Human Genome Project presents opportunities and challenges. The genomic era has the potential to advance an understanding of human genetic variation and its role in human health and disease. A challenge for genomics research is to understand the…
Efficient identification of context dependent subgroups of risk from genome wide association studies

PubMed Central

Dyson, Greg; Sing, Charles F.

2014-01-01

We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (> 500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are underestimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models. PMID:24570412
Genome-wide variation within and between wild and domestic yak.

PubMed

Wang, Kun; Hu, Quanjun; Ma, Hui; Wang, Lizhong; Yang, Yongzhi; Luo, Wenchun; Qiu, Qiang

2014-07-01

The yak is one of the few animals that can thrive in the harsh environment of the Qinghai-Tibetan Plateau and adjacent Alpine regions. Yak provides essential resources allowing Tibetans to live at high altitudes. However, genetic variation within and between wild and domestic yak remain unknown. Here, we present a genome-wide study of the genetic variation within and between wild and domestic yak. Using next-generation sequencing technology, we resequenced three wild and three domestic yak with a mean of fivefold coverage using our published domestic yak genome as a reference. We identified a total of 8.38 million SNPs (7.14 million novel), 383,241 InDels and 126,352 structural variants between the six yak. We observed higher linkage disequilibrium in domestic yak than in wild yak and a modest but distinct genetic divergence between these two groups. We further identified more than a thousand of potential selected regions (PSRs) for the three domestic yak by scanning the whole genome. These genomic resources can be further used to study genetic diversity and select superior breeds of yak and other bovid species. © 2014 John Wiley & Sons Ltd.
Evolutionary genomics of animal personality.

PubMed

van Oers, Kees; Mueller, Jakob C

2010-12-27

Research on animal personality can be approached from both a phenotypic and a genetic perspective. While using a phenotypic approach one can measure present selection on personality traits and their combinations. However, this approach cannot reconstruct the historical trajectory that was taken by evolution. Therefore, it is essential for our understanding of the causes and consequences of personality diversity to link phenotypic variation in personality traits with polymorphisms in genomic regions that code for this trait variation. Identifying genes or genome regions that underlie personality traits will open exciting possibilities to study natural selection at the molecular level, gene-gene and gene-environment interactions, pleiotropic effects and how gene expression shapes personality phenotypes. In this paper, we will discuss how genome information revealed by already established approaches and some more recent techniques such as high-throughput sequencing of genomic regions in a large number of individuals can be used to infer micro-evolutionary processes, historical selection and finally the maintenance of personality trait variation. We will do this by reviewing recent advances in molecular genetics of animal personality, but will also use advanced human personality studies as case studies of how molecular information may be used in animal personality research in the near future.
Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

PubMed

Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

2014-11-07

Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.
Population Genomics of Paramecium Species.

PubMed

Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael

2017-05-01

Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Extreme Recombination Frequencies Shape Genome Variation and Evolution in the Honeybee, Apis mellifera

PubMed Central

Wallberg, Andreas; Glémin, Sylvain; Webster, Matthew T.

2015-01-01

Meiotic recombination is a fundamental cellular process, with important consequences for evolution and genome integrity. However, we know little about how recombination rates vary across the genomes of most species and the molecular and evolutionary determinants of this variation. The honeybee, Apis mellifera, has extremely high rates of meiotic recombination, although the evolutionary causes and consequences of this are unclear. Here we use patterns of linkage disequilibrium in whole genome resequencing data from 30 diploid honeybees to construct a fine-scale map of rates of crossing over in the genome. We find that, in contrast to vertebrate genomes, the recombination landscape is not strongly punctate. Crossover rates strongly correlate with levels of genetic variation, but not divergence, which indicates a pervasive impact of selection on the genome. Germ-line methylated genes have reduced crossover rate, which could indicate a role of methylation in suppressing recombination. Controlling for the effects of methylation, we do not infer a strong association between gene expression patterns and recombination. The site frequency spectrum is strongly skewed from neutral expectations in honeybees: rare variants are dominated by AT-biased mutations, whereas GC-biased mutations are found at higher frequencies, indicative of a major influence of GC-biased gene conversion (gBGC), which we infer to generate an allele fixation bias 5 – 50 times the genomic average estimated in humans. We uncover further evidence that this repair bias specifically affects transitions and favours fixation of CpG sites. Recombination, via gBGC, therefore appears to have profound consequences on genome evolution in honeybees and interferes with the process of natural selection. These findings have important implications for our understanding of the forces driving molecular evolution. PMID:25902173
The draft genome of a socially polymorphic halictid bee, Lasioglossum albipes

PubMed Central

2013-01-01

Background Taxa that harbor natural phenotypic variation are ideal for ecological genomic approaches aimed at understanding how the interplay between genetic and environmental factors can lead to the evolution of complex traits. Lasioglossum albipes is a polymorphic halictid bee that expresses variation in social behavior among populations, and common-garden experiments have suggested that this variation is likely to have a genetic component. Results We present the L. albipes genome assembly to characterize the genetic and ecological factors associated with the evolution of social behavior. The de novo assembly is comparable to other published social insect genomes, with an N50 scaffold length of 602 kb. Gene families unique to L. albipes are associated with integrin-mediated signaling and DNA-binding domains, and several appear to be expanded in this species, including the glutathione-s-transferases and the inositol monophosphatases. L. albipes has an intact DNA methylation system, and in silico analyses suggest that methylation occurs primarily in exons. Comparisons to other insect genomes indicate that genes associated with metabolism and nucleotide binding undergo accelerated evolution in the halictid lineage. Whole-genome resequencing data from one solitary and one social L. albipes female identify six genes that appear to be rapidly diverging between social forms, including a putative odorant receptor and a cuticular protein. Conclusions L. albipes represents a novel genetic model system for understanding the evolution of social behavior. It represents the first published genome sequence of a primitively social insect, thereby facilitating comparative genomic studies across the Hymenoptera as a whole. PMID:24359881
Novel origins of copy number variation in the dog genome

PubMed Central

2012-01-01

Background Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves. Results We use a stringent new method to identify a total of 430 high-confidence CNV loci, which range in size from 9 kb to 1.6 Mb and span 26.4 Mb, or 1.08%, of the assayed dog genome, overlapping 413 annotated genes. Of CNVs observed in each breed, 98% are also observed in multiple breeds. CNVs predicted to disrupt gene function are significantly less common than expected by chance. We identify a significant overrepresentation of peaks of GC content, previously shown to be enriched in dog recombination hotspots, in the vicinity of CNV breakpoints. Conclusions A number of the CNVs identified by this study are candidates for generating breed-specific phenotypes. Purifying selection seems to be a major factor shaping structural variation in the dog genome, suggesting that many CNVs are deleterious. Localized peaks of GC content appear to be novel sites of CNV formation in the dog genome by non-allelic homologous recombination, potentially activated by the loss of PRDM9. These sequence features may have driven genome instability and chromosomal rearrangements throughout canid evolution. PMID:22916802
Sequence Analysis and Characterization of Active Human Alu Subfamilies Based on the 1000 Genomes Pilot Project.

PubMed

Konkel, Miriam K; Walker, Jerilyn A; Hotard, Ashley B; Ranck, Megan C; Fontenot, Catherine C; Storer, Jessica; Stewart, Chip; Marth, Gabor T; Batzer, Mark A

2015-08-29

The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array

USDA-ARS?s Scientific Manuscript database

Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases...
Toward Integration of Comparative Genetic, Physical, Diversity, and Cytomolecular Maps for Grasses and Grains, Using the Sorghum Genome as a Foundation1

PubMed Central

Draye, Xavier; Lin, Yann-Rong; Qian, Xiao-yin; Bowers, John E.; Burow, Gloria B.; Morrell, Peter L.; Peterson, Daniel G.; Presting, Gernot G.; Ren, Shu-xin; Wing, Rod A.; Paterson, Andrew H.

2001-01-01

The small genome of sorghum (Sorghum bicolor L. Moench.) provides an important template for study of closely related large-genome crops such as maize (Zea mays) and sugarcane (Saccharum spp.), and is a logical complement to distantly related rice (Oryza sativa) as a “grass genome model.” Using a high-density RFLP map as a framework, a robust physical map of sorghum is being assembled by integrating hybridization and fingerprint data with comparative data from related taxa such as rice and using new methods to resolve genomic duplications into locus-specific groups. By taking advantage of allelic variation revealed by heterologous probes, the positions of corresponding loci on the wheat (Triticum aestivum), rice, maize, sugarcane, and Arabidopsis genomes are being interpolated on the sorghum physical map. Bacterial artificial chromosomes for the small genome of rice are shown to close several gaps in the sorghum contigs; the emerging rice physical map and assembled sequence will further accelerate progress. An important motivation for developing genomic tools is to relate molecular level variation to phenotypic diversity. “Diversity maps,” which depict the levels and patterns of variation in different gene pools, shed light on relationships of allelic diversity with chromosome organization, and suggest possible locations of genomic regions that are under selection due to major gene effects (some of which may be revealed by quantitative trait locus mapping). Both physical maps and diversity maps suggest interesting features that may be integrally related to the chromosomal context of DNA—progress in cytology promises to provide a means to elucidate such relationships. We seek to provide a detailed picture of the structure, function, and evolution of the genome of sorghum and its relatives, together with molecular tools such as locus-specific sequence-tagged site DNA markers and bacterial artificial chromosome contigs that will have enduring value for many aspects of genome analysis. PMID:11244113
Improved genomic resources and new bioinformatic workflow for the carcinogenic parasite Clonorchis sinensis: Biotechnological implications.

PubMed

Wang, Daxi; Korhonen, Pasi K; Gasser, Robin B; Young, Neil D

Clonorchis sinensis (family Opisthorchiidae) is an important foodborne parasite that has a major socioeconomic impact on ~35 million people predominantly in China, Vietnam, Korea and the Russian Far East. In humans, infection with C. sinensis causes clonorchiasis, a complex hepatobiliary disease that can induce cholangiocarcinoma (CCA), a malignant cancer of the bile ducts. Central to understanding the epidemiology of this disease is knowledge of genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species, evidence of karyotypic variation within C. sinensis and cryptic species within a related opisthorchiid fluke (Opisthorchis viverrini) emphasise the importance of studying and comparing the genes and genomes of geographically distinct isolates of C. sinensis. Recently, we sequenced, assembled and characterised a draft nuclear genome of a C. sinensis isolate from Korea and compared it with a published draft genome of a Chinese isolate of this species using a bioinformatic workflow established for comparing draft genome assemblies and their gene annotations. We identified that 50.6% and 51.3% of the Korean and Chinese C. sinensis genomic scaffolds were syntenic, respectively. Within aligned syntenic blocks, the genomes had a high level of nucleotide identity (99.1%) and encoded 15 variable proteins likely to be involved in diverse biological processes. Here, we review current technical challenges of using draft genome assemblies to undertake comparative genomic analyses to quantify genetic variation between isolates of the same species. Using a workflow that overcomes these challenges, we report on a high-quality draft genome for C. sinensis from Korea and comparative genomic analyses, as a basis for future investigations of the genetic structures of C. sinensis populations, and discuss the biotechnological implications of these explorations. Copyright © 2018 Elsevier Inc. All rights reserved.
Anthocyanin inhibits propidium iodide DNA fluorescence in Euphorbia pulcherrima: implications for genome size variation and flow cytometry.

PubMed

Bennett, Michael D; Price, H James; Johnston, J Spencer

2008-04-01

Measuring genome size by flow cytometry assumes direct proportionality between nuclear DNA staining and DNA amount. By 1997 it was recognized that secondary metabolites may affect DNA staining, thereby causing inaccuracy. Here experiments are reported with poinsettia (Euphorbia pulcherrima) with green leaves and red bracts rich in phenolics. DNA content was estimated as fluorescence of propidium iodide (PI)-stained nuclei of poinsettia and/or pea (Pisum sativum) using flow cytometry. Tissue was chopped, or two tissues co-chopped, in Galbraith buffer alone or with six concentrations of cyanidin-3-rutinoside (a cyanidin-3-rhamnoglucoside contributing to red coloration in poinsettia). There were large differences in PI staining (35-70 %) between 2C nuclei from green leaf and red bract tissue in poinsettia. These largely disappeared when pea leaflets were co-chopped with poinsettia tissue as an internal standard. However, smaller (2.8-6.9 %) differences remained, and red bracts gave significantly lower 1C genome size estimates (1.69-1.76 pg) than green leaves (1.81 pg). Chopping pea or poinsettia tissue in buffer with 0-200 microm cyanidin-3-rutinoside showed that the effects of natural inhibitors in red bracts of poinsettia on PI staining were largely reproduced in a dose-dependent way by this anthocyanin. Given their near-ubiquitous distribution, many suspected roles and known affects on DNA staining, anthocyanins are a potent, potential cause of significant error variation in genome size estimations for many plant tissues and taxa. This has important implications of wide practical and theoretical significance. When choosing genome size calibration standards it seems prudent to select materials producing little or no anthocyanin. Reviewing the literature identifies clear examples in which claims of intraspecific variation in genome size are probably artefacts caused by natural variation in anthocyanin levels or correlated with environmental factors known to induce variation in pigmentation.
Minimal Contribution of APOBEC3-Induced G-to-A Hypermutation to HIV-1 Recombination and Genetic Variation

PubMed Central

Nikolaitchik, Olga A.; Burdick, Ryan C.; Gorelick, Robert J.; Keele, Brandon F.; Hu, Wei-Shau; Pathak, Vinay K.

2016-01-01

Although the predominant effect of host restriction APOBEC3 proteins on HIV-1 infection is to block viral replication, they might inadvertently increase retroviral genetic variation by inducing G-to-A hypermutation. Numerous studies have disagreed on the contribution of hypermutation to viral genetic diversity and evolution. Confounding factors contributing to the debate include the extent of lethal (stop codon) and sublethal hypermutation induced by different APOBEC3 proteins, the inability to distinguish between G-to-A mutations induced by APOBEC3 proteins and error-prone viral replication, the potential impact of hypermutation on the frequency of retroviral recombination, and the extent to which viral recombination occurs in vivo, which can reassort mutations in hypermutated genomes. Here, we determined the effects of hypermutation on the HIV-1 recombination rate and its contribution to genetic variation through recombination to generate progeny genomes containing portions of hypermutated genomes without lethal mutations. We found that hypermutation did not significantly affect the rate of recombination, and recombination between hypermutated and wild-type genomes only increased the viral mutation rate by 3.9 × 10−5 mutations/bp/replication cycle in heterozygous virions, which is similar to the HIV-1 mutation rate. Since copackaging of hypermutated and wild-type genomes occurs very rarely in vivo, recombination between hypermutated and wild-type genomes does not significantly contribute to the genetic variation of replicating HIV-1. We also analyzed previously reported hypermutated sequences from infected patients and determined that the frequency of sublethal mutagenesis for A3G and A3F is negligible (4 × 10−21 and1 × 10−11, respectively) and its contribution to viral mutations is far below mutations generated during error-prone reverse transcription. Taken together, we conclude that the contribution of APOBEC3-induced hypermutation to HIV-1 genetic variation is substantially lower than that from mutations during error-prone replication. PMID:27186986
Minimal Contribution of APOBEC3-Induced G-to-A Hypermutation to HIV-1 Recombination and Genetic Variation.

PubMed

Delviks-Frankenberry, Krista A; Nikolaitchik, Olga A; Burdick, Ryan C; Gorelick, Robert J; Keele, Brandon F; Hu, Wei-Shau; Pathak, Vinay K

2016-05-01

Although the predominant effect of host restriction APOBEC3 proteins on HIV-1 infection is to block viral replication, they might inadvertently increase retroviral genetic variation by inducing G-to-A hypermutation. Numerous studies have disagreed on the contribution of hypermutation to viral genetic diversity and evolution. Confounding factors contributing to the debate include the extent of lethal (stop codon) and sublethal hypermutation induced by different APOBEC3 proteins, the inability to distinguish between G-to-A mutations induced by APOBEC3 proteins and error-prone viral replication, the potential impact of hypermutation on the frequency of retroviral recombination, and the extent to which viral recombination occurs in vivo, which can reassort mutations in hypermutated genomes. Here, we determined the effects of hypermutation on the HIV-1 recombination rate and its contribution to genetic variation through recombination to generate progeny genomes containing portions of hypermutated genomes without lethal mutations. We found that hypermutation did not significantly affect the rate of recombination, and recombination between hypermutated and wild-type genomes only increased the viral mutation rate by 3.9 × 10-5 mutations/bp/replication cycle in heterozygous virions, which is similar to the HIV-1 mutation rate. Since copackaging of hypermutated and wild-type genomes occurs very rarely in vivo, recombination between hypermutated and wild-type genomes does not significantly contribute to the genetic variation of replicating HIV-1. We also analyzed previously reported hypermutated sequences from infected patients and determined that the frequency of sublethal mutagenesis for A3G and A3F is negligible (4 × 10-21 and1 × 10-11, respectively) and its contribution to viral mutations is far below mutations generated during error-prone reverse transcription. Taken together, we conclude that the contribution of APOBEC3-induced hypermutation to HIV-1 genetic variation is substantially lower than that from mutations during error-prone replication.
Assembly and comparison of two closely related Brassica napus genomes.

PubMed

Bayer, Philipp E; Hurgobin, Bhavna; Golicz, Agnieszka A; Chan, Chon-Kit Kenneth; Yuan, Yuxuan; Lee, HueyTyng; Renton, Michael; Meng, Jinling; Li, Ruiyuan; Long, Yan; Zou, Jun; Bancroft, Ian; Chalhoub, Boulos; King, Graham J; Batley, Jacqueline; Edwards, David

2017-12-01

As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Survival and divergence in a small group: The extraordinary genomic history of the endangered Apennine brown bear stragglers

PubMed Central

Benazzo, Andrea; Trucchi, Emiliano; Cahill, James A.; Maisano Delser, Pierpaolo; Mona, Stefano; Fumagalli, Matteo; Cornetti, Luca; Ghirotto, Silvia; Girardi, Matteo; Ometto, Lino; Panziera, Alex; Rota-Stabelli, Omar; Zanetti, Enrico; Karamanlidis, Alexandros; Groff, Claudio; Paule, Ladislav; Gentile, Leonardo; Vicario, Saverio; Boitani, Luigi; Fuselli, Silvia; Vernesi, Cristiano; Bertorelle, Giorgio

2017-01-01

About 100 km east of Rome, in the central Apennine Mountains, a critically endangered population of ∼50 brown bears live in complete isolation. Mating outside this population is prevented by several 100 km of bear-free territories. We exploited this natural experiment to better understand the gene and genomic consequences of surviving at extremely small population size. We found that brown bear populations in Europe lost connectivity since Neolithic times, when farming communities expanded and forest burning was used for land clearance. In central Italy, this resulted in a 40-fold population decline. The overall genomic impact of this decline included the complete loss of variation in the mitochondrial genome and along long stretches of the nuclear genome. Several private and deleterious amino acid changes were fixed by random drift; predicted effects include energy deficit, muscle weakness, anomalies in cranial and skeletal development, and reduced aggressiveness. Despite this extreme loss of diversity, Apennine bear genomes show nonrandom peaks of high variation, possibly maintained by balancing selection, at genomic regions significantly enriched for genes associated with immune and olfactory systems. Challenging the paradigm of increased extinction risk in small populations, we suggest that random fixation of deleterious alleles (i) can be an important driver of divergence in isolation, (ii) can be tolerated when balancing selection prevents random loss of variation at important genes, and (iii) is followed by or results directly in favorable behavioral changes. PMID:29078308
Survival and divergence in a small group: The extraordinary genomic history of the endangered Apennine brown bear stragglers.

PubMed

Benazzo, Andrea; Trucchi, Emiliano; Cahill, James A; Maisano Delser, Pierpaolo; Mona, Stefano; Fumagalli, Matteo; Bunnefeld, Lynsey; Cornetti, Luca; Ghirotto, Silvia; Girardi, Matteo; Ometto, Lino; Panziera, Alex; Rota-Stabelli, Omar; Zanetti, Enrico; Karamanlidis, Alexandros; Groff, Claudio; Paule, Ladislav; Gentile, Leonardo; Vilà, Carles; Vicario, Saverio; Boitani, Luigi; Orlando, Ludovic; Fuselli, Silvia; Vernesi, Cristiano; Shapiro, Beth; Ciucci, Paolo; Bertorelle, Giorgio

2017-11-07

About 100 km east of Rome, in the central Apennine Mountains, a critically endangered population of ∼50 brown bears live in complete isolation. Mating outside this population is prevented by several 100 km of bear-free territories. We exploited this natural experiment to better understand the gene and genomic consequences of surviving at extremely small population size. We found that brown bear populations in Europe lost connectivity since Neolithic times, when farming communities expanded and forest burning was used for land clearance. In central Italy, this resulted in a 40-fold population decline. The overall genomic impact of this decline included the complete loss of variation in the mitochondrial genome and along long stretches of the nuclear genome. Several private and deleterious amino acid changes were fixed by random drift; predicted effects include energy deficit, muscle weakness, anomalies in cranial and skeletal development, and reduced aggressiveness. Despite this extreme loss of diversity, Apennine bear genomes show nonrandom peaks of high variation, possibly maintained by balancing selection, at genomic regions significantly enriched for genes associated with immune and olfactory systems. Challenging the paradigm of increased extinction risk in small populations, we suggest that random fixation of deleterious alleles ( i ) can be an important driver of divergence in isolation, ( ii ) can be tolerated when balancing selection prevents random loss of variation at important genes, and ( iii ) is followed by or results directly in favorable behavioral changes. Published under the PNAS license.
A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

USDA-ARS?s Scientific Manuscript database

The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identi...

Genome-wide association as a means to understanding the mammary gland

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing and related technologies have facilitated the creation of enormous public databases that catalogue genomic variation. These databases have facilitated a variety of approaches to discover new genes that regulate normal biology as well as disease. Genome wide association (...
Phenotypic and genomic analysis of a fast neutron mutant population resource in soybean

USDA-ARS?s Scientific Manuscript database

Mutagenized populations have become indispensable resources for introducing variation and studying gene function in plant genomics research. We utilized fast neutron radiation to induce deletion mutations in the soybean genome and phenotypically screened the resulting population. We exposed approxim...
Quantifying the Variation in the Effective Population Size Within a Genome

PubMed Central

Gossmann, Toni I.; Woolfit, Megan; Eyre-Walker, Adam

2011-01-01

The effective population size (Ne) is one of the most fundamental parameters in population genetics. It is thought to vary across the genome as a consequence of differences in the rate of recombination and the density of selected sites due to the processes of genetic hitchhiking and background selection. Although it is known that there is intragenomic variation in the effective population size in some species, it is not known whether this is widespread or how much variation in the effective population size there is. Here, we test whether the effective population size varies across the genome, between protein-coding genes, in 10 eukaryotic species by considering whether there is significant variation in neutral diversity, taking into account differences in the mutation rate between loci by using the divergence between species. In most species we find significant evidence of variation. We investigate whether the variation in Ne is correlated to recombination rate and the density of selected sites in four species, for which these data are available. We find that Ne is positively correlated to recombination rate in one species, Drosophila melanogaster, and negatively correlated to a measure of the density of selected sites in two others, humans and Arabidopsis thaliana. However, much of the variation remains unexplained. We use a hierarchical Bayesian analysis to quantify the amount of variation in the effective population size and show that it is quite modest in all species—most genes have an Ne that is within a few fold of all other genes. Nonetheless we show that this modest variation in Ne is sufficient to cause significant differences in the efficiency of natural selection across the genome, by demonstrating that the ratio of the number of nonsynonymous to synonymous polymorphisms is significantly correlated to synonymous diversity and estimates of Ne, even taking into account the obvious nonindependence between these measures. PMID:21954163
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

PubMed Central

Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

2011-01-01

Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.

PubMed

Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini

2016-02-01

Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A continuum of admixture in the Western Hemisphere revealed by the African Diaspora genome

PubMed Central

Mathias, Rasika Ann; Taub, Margaret A.; Gignoux, Christopher R.; Fu, Wenqing; Musharoff, Shaila; O'Connor, Timothy D.; Vergara, Candelaria; Torgerson, Dara G.; Pino-Yanes, Maria; Shringarpure, Suyash S.; Huang, Lili; Rafaels, Nicholas; Boorgula, Meher Preethi; Johnston, Henry Richard; Ortega, Victor E.; Levin, Albert M.; Song, Wei; Torres, Raul; Padhukasahasram, Badri; Eng, Celeste; Mejia-Mejia, Delmy-Aracely; Ferguson, Trevor; Qin, Zhaohui S.; Scott, Alan F.; Yazdanbakhsh, Maria; Wilson, James G.; Marrugo, Javier; Lange, Leslie A.; Kumar, Rajesh; Avila, Pedro C.; Williams, L. Keoki; Watson, Harold; Ware, Lorraine B.; Olopade, Christopher; Olopade, Olufunmilayo; Oliveira, Ricardo; Ober, Carole; Nicolae, Dan L.; Meyers, Deborah; Mayorga, Alvaro; Knight-Madden, Jennifer; Hartert, Tina; Hansel, Nadia N.; Foreman, Marilyn G.; Ford, Jean G.; Faruque, Mezbah U.; Dunston, Georgia M.; Caraballo, Luis; Burchard, Esteban G.; Bleecker, Eugene; Araujo, Maria Ilma; Herrera-Paz, Edwin Francisco; Gietzen, Kimberly; Grus, Wendy E.; Bamshad, Michael; Bustamante, Carlos D.; Kenny, Eimear E.; Hernandez, Ryan D.; Beaty, Terri H.; Ruczinski, Ingo; Akey, Joshua; Campbell, Monica; Chavan, Sameer; Foster, Cassandra; Gao, Li; Horowitz, Edward; Ortiz, Romina; Potee, Joseph; Gao, Jingjing; Hu, Yijuan; Hansen, Mark; Deshpande, Aniket; Locke, Devin P.; Grammer, Leslie; Kim, Kwang-YounA; Schleimer, Robert; De La Vega, Francisco M.; Szpiech, Zachary A.; Oluwole, Oluwafemi; Arinola, Ganiyu; Correa, Adolfo; Musani, Solomon; Chong, Jessica; Nickerson, Deborah; Reiner, Alexander; Maul, Pissamai; Maul, Trevor; Martinez, Beatriz; Meza, Catherine; Ayestas, Gerardo; Landaverde-Torres, Pamela; Erazo, Said Omar Leiva; Martinez, Rosella; Mayorga, Luis F.; Ramos, Hector; Saenz, Allan; Varela, Gloria; Vasquez, Olga Marina; Samms-Vaughan, Maureen; Wilks, Rainford J.; Adegnika, Akim; Ateba-Ngoa, Ulysse; Barnes, Kathleen C.

2016-01-01

The African Diaspora in the Western Hemisphere represents one of the largest forced migrations in history and had a profound impact on genetic diversity in modern populations. To date, the fine-scale population structure of descendants of the African Diaspora remains largely uncharacterized. Here we present genetic variation from deeply sequenced genomes of 642 individuals from North and South American, Caribbean and West African populations, substantially increasing the lexicon of human genomic variation and suggesting much variation remains to be discovered in African-admixed populations in the Americas. We summarize genetic variation in these populations, quantifying the postcolonial sex-biased European gene flow across multiple regions. Moreover, we refine estimates on the burden of deleterious variants carried across populations and how this varies with African ancestry. Our data are an important resource for empowering disease mapping studies in African-admixed individuals and will facilitate gene discovery for diseases disproportionately affecting individuals of African ancestry. PMID:27725671
Adapting legume crops to climate change using genomic approaches.

PubMed

Mousavi-Derazmahalleh, Mahsa; Bayer, Philipp E; Hane, James K; Valliyodan, Babu; Nguyen, Henry T; Nelson, Matthew N; Erskine, William; Varshney, Rajeev K; Papa, Roberto; Edwards, David

2018-03-30

Our agricultural system and hence food security is threatened by combination of events, such as increasing population, the impacts of climate change, and the need to a more sustainable development. Evolutionary adaptation may help some species to overcome environmental changes through new selection pressures driven by climate change. However, success of evolutionary adaptation is dependent on various factors, one of which is the extent of genetic variation available within species. Genomic approaches provide an exceptional opportunity to identify genetic variation that can be employed in crop improvement programs. In this review, we illustrate some of the routinely used genomics-based methods as well as recent breakthroughs, which facilitate assessment of genetic variation and discovery of adaptive genes in legumes. Although additional information is needed, the current utility of selection tools indicate a robust ability to utilize existing variation among legumes to address the challenges of climate uncertainty. © 2018 The Authors. Plant, Cell & Environment Published by John Wiley & Sons Ltd.
A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

PubMed

Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

2014-10-01

The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.
Does the central dogma still stand?

PubMed Central

2012-01-01

Abstract Prions are agents of analog, protein conformation-based inheritance that can confer beneficial phenotypes to cells, especially under stress. Combined with genetic variation, prion-mediated inheritance can be channeled into prion-independent genomic inheritance. Latest screening shows that prions are common, at least in fungi. Thus, there is non-negligible flow of information from proteins to the genome in modern cells, in a direct violation of the Central Dogma of molecular biology. The prion-mediated heredity that violates the Central Dogma appears to be a specific, most radical manifestation of the widespread assimilation of protein (epigenetic) variation into genetic variation. The epigenetic variation precedes and facilitates genetic adaptation through a general ‘look-ahead effect’ of phenotypic mutations. This direction of the information flow is likely to be one of the important routes of environment-genome interaction and could substantially contribute to the evolution of complex adaptive traits. Reviewers This article was reviewed by Jerzy Jurka, Pierre Pontarotti and Juergen Brosius. For the complete reviews, see the Reviewers’ Reports section. PMID:22913395
Novel functions of prototype foamy virus Gag glycine- arginine-rich boxes in reverse transcription and particle morphogenesis.

PubMed

Müllers, Erik; Uhlig, Tobias; Stirnnagel, Kristin; Fiebig, Uwe; Zentgraf, Hanswalter; Lindemann, Dirk

2011-02-01

Prototype foamy virus (PFV) Gag lacks the characteristic orthoretroviral Cys-His motifs that are essential for various steps of the orthoretroviral replication cycle, such as RNA packaging, reverse transcription, infectivity, integration, and viral assembly. Instead, it contains three glycine-arginine-rich boxes (GR boxes) in its C terminus that putatively represent a functional equivalent. We used a four-plasmid replication-deficient PFV vector system, with uncoupled RNA genome packaging and structural protein translation, to analyze the effects of deletion and various substitution mutations within each GR box on particle release, particle-associated protein composition, RNA packaging, DNA content, infectivity, particle morphology, and intracellular localization. The degree of viral particle release by all mutants was similar to that of the wild type. Only minimal effects on Pol encapsidation, exogenous reverse transcriptase (RT) activity, and genomic viral RNA packaging were observed. In contrast, particle-associated DNA content and infectivity were drastically reduced for all deletion mutants and were undetectable for all alanine substitution mutants. Furthermore, GR box I mutants had significant changes in particle morphology, and GR box II mutants lacked the typical nuclear localization pattern of PFV Gag. Finally, it could be shown that GR boxes I and III, but not GR box II, can functionally complement each other. It therefore appears that, similar to the orthoretroviral Cys-His motifs, the PFV Gag GR boxes are important for RNA encapsidation, genome reverse transcription, and virion infectivity as well as for particle morphogenesis.
Molecular Epidemiology of Adenovirus Type 21 Respiratory Strains Isolated From US Military Trainees (1996-2014).

PubMed

Kajon, Adriana E; Hang, Jun; Hawksworth, Anthony; Metzgar, David; Hage, Elias; Hansen, Christian J; Kuschner, Robert A; Blair, Patrick; Russell, Kevin L; Jarman, Richard G

2015-09-15

The circulation of human adenovirus type 21 (HAdV21) in the United States has been documented since the 1960s in association with outbreaks of febrile respiratory illness (FRI) in military boot camps and civilian cases of respiratory disease. To describe the molecular epidemiology of HAdV21 respiratory infections across the country, 150 clinical respiratory isolates obtained from continuous surveillance of military recruit FRI, and 23 respiratory isolates recovered from pediatric and adult civilian cases of acute respiratory infection were characterized to compile molecular typing data spanning 37 years (1978-2014). Restriction enzyme analysis and genomic sequencing identified 2 clusters of closely related genomic variants readily distinguishable from the prototype and designated 21a-like and 21b-like. A-like variants predominated until 1999. A shift to b-like variants was noticeable by 2007 after a 7-year period (2000-2006) of cocirculation of the 2 genome types. US strains are phylogenetically more closely related to European and Asian strains isolated over the last 4 decades than to the Saudi Arabian prototype strain AV-1645 isolated in 1956. Knowledge of circulating HAdV21 variants and their epidemic behavior will be of significant value to local and global FRI surveillance efforts. © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Cytoplasmic utilization of human immunodeficiency virus type 1 genomic RNA is not dependent on a nuclear interaction with gag.

PubMed

Grewe, Bastian; Hoffmann, Bianca; Ohs, Inga; Blissenbach, Maik; Brandt, Sabine; Tippler, Bettina; Grunwald, Thomas; Uberla, Klaus

2012-03-01

In some retroviruses, such as Rous sarcoma virus and prototype foamy virus, Gag proteins are known to shuttle between the nucleus and the cytoplasm and are implicated in nuclear export of the viral genomic unspliced RNA (gRNA) for subsequent encapsidation. A similar function has been proposed for human immunodeficiency virus type 1 (HIV-1) Gag based on the identification of nuclear localization and export signals. However, the ability of HIV-1 Gag to transit through the nucleus has never been confirmed. In addition, the lentiviral Rev protein promotes efficient nuclear gRNA export, and previous reports indicate a cytoplasmic interaction between Gag and gRNA. Therefore, functional effects of HIV-1 Gag on gRNA and its usage were explored. Expression of gag in the absence of Rev was not able to increase cytoplasmic gRNA levels of subgenomic, proviral, or lentiviral vector constructs, and gene expression from genomic reporter plasmids could not be induced by Gag provided in trans. Furthermore, Gag lacking the reported nuclear localization and export signals was still able to mediate an efficient packaging process. Although small amounts of Gag were detectable in the nuclei of transfected cells, a Crm1-dependent nuclear export signal in Gag could not be confirmed. Thus, our study does not provide any evidence for a nuclear function of HIV-1 Gag. The encapsidation process of HIV-1 therefore clearly differs from that of Rous sarcoma virus and prototype foamy virus.
Commentary: The Materials Project: A materials genome approach to accelerating materials innovation

NASA Astrophysics Data System (ADS)

Jain, Anubhav; Ong, Shyue Ping; Hautier, Geoffroy; Chen, Wei; Richards, William Davidson; Dacek, Stephen; Cholia, Shreyas; Gunter, Dan; Skinner, David; Ceder, Gerbrand; Persson, Kristin A.

2013-07-01

Accelerating the discovery of advanced materials is essential for human welfare and sustainable, clean energy. In this paper, we introduce the Materials Project (www.materialsproject.org), a core program of the Materials Genome Initiative that uses high-throughput computing to uncover the properties of all known inorganic materials. This open dataset can be accessed through multiple channels for both interactive exploration and data mining. The Materials Project also seeks to create open-source platforms for developing robust, sophisticated materials analyses. Future efforts will enable users to perform ``rapid-prototyping'' of new materials in silico, and provide researchers with new avenues for cost-effective, data-driven materials design.
Large Diversity of Nonstandard Genes and Dynamic Evolution of Chloroplast Genomes in Siphonous Green Algae (Bryopsidales, Chlorophyta)

PubMed Central

Leliaert, Frederik; Marcelino, Vanessa R

2018-01-01

Abstract Chloroplast genomes have undergone tremendous alterations through the evolutionary history of the green algae (Chloroplastida). This study focuses on the evolution of chloroplast genomes in the siphonous green algae (order Bryopsidales). We present five new chloroplast genomes, which along with existing sequences, yield a data set representing all but one families of the order. Using comparative phylogenetic methods, we investigated the evolutionary dynamics of genomic features in the order. Our results show extensive variation in chloroplast genome architecture and intron content. Variation in genome size is accounted for by the amount of intergenic space and freestanding open reading frames that do not show significant homology to standard plastid genes. We show the diversity of these nonstandard genes based on their conserved protein domains, which are often associated with mobile functions (reverse transcriptase/intron maturase, integrases, phage- or plasmid-DNA primases, transposases, integrases, ligases). Investigation of the introns showed proliferation of group II introns in the early evolution of the order and their subsequent loss in the core Halimedineae, possibly through RT-mediated intron loss. PMID:29635329
Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

PubMed

Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

2016-07-07

Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. Copyright © 2016 Teng et al.
Eucalyptus applied genomics: from gene sequences to breeding tools.

PubMed

Grattapaglia, Dario; Kirst, Matias

2008-01-01

Eucalyptus is the most widely planted hardwood crop in the tropical and subtropical world because of its superior growth, broad adaptability and multipurpose wood properties. Plantation forestry of Eucalyptus supplies high-quality woody biomass for several industrial applications while reducing the pressure on tropical forests and associated biodiversity. This review links current eucalypt breeding practices with existing and emerging genomic tools. A brief discussion provides a background to modern eucalypt breeding together with some current applications of molecular markers in support of operational breeding. Quantitative trait locus (QTL) mapping and genetical genomics are reviewed and an in-depth perspective is provided on the power of association genetics to dissect quantitative variation in this highly diverse organism. Finally, some challenges and opportunities to integrate genomic information into directional selective breeding are discussed in light of the upcoming draft of the Eucalyptus grandis genome. Given the extraordinary genetic variation that exists in the genus Eucalyptus, the ingenuity of most breeders, and the powerful genomic tools that have become available, the prospects of applied genomics in Eucalyptus forest production are encouraging.
Genome-wide Association Study Identifies Loci for the Polled Phenotype in Yak

PubMed Central

Wu, Xiaoyun; Wang, Kun; Ding, Xuezhi; Wang, Mingcheng; Chu, Min; Xie, Xiuyue; Qiu, Qiang; Yan, Ping

2016-01-01

The absence of horns, known as the polled phenotype, is an economically important trait in modern yak husbandry, but the genomic structure and genetic basis of this phenotype have yet to be discovered. Here, we conducted a genome-wide association study with a panel of 10 horned and 10 polled yaks using whole genome sequencing. We mapped the POLLED locus to a 200-kb interval, which comprises three protein-coding genes. Further characterization of the candidate region showed recent artificial selection signals resulting from the breeding process. We suggest that expressional variations rather than structural variations in protein probably contribute to the polled phenotype. Our results not only represent the first and important step in establishing the genomic structure of the polled region in yak, but also add to our understanding of the polled trait in bovid species. PMID:27389700
Characteristics of products generated by selective sintering and stereolithography rapid prototyping processes

NASA Technical Reports Server (NTRS)

Cariapa, Vikram

1993-01-01

The trend in the modern global economy towards free market policies has motivated companies to use rapid prototyping technologies to not only reduce product development cycle time but also to maintain their competitive edge. A rapid prototyping technology is one which combines computer aided design with computer controlled tracking of focussed high energy source (eg. lasers, heat) on modern ceramic powders, metallic powders, plastics or photosensitive liquid resins in order to produce prototypes or models. At present, except for the process of shape melting, most rapid prototyping processes generate products that are only dimensionally similar to those of the desired end product. There is an urgent need, therefore, to enhance the understanding of the characteristics of these processes in order to realize their potential for production. Currently, the commercial market is dominated by four rapid prototyping processes, namely selective laser sintering, stereolithography, fused deposition modelling and laminated object manufacturing. This phase of the research has focussed on the selective laser sintering and stereolithography rapid prototyping processes. A theoretical model for these processes is under development. Different rapid prototyping sites supplied test specimens (based on ASTM 638-84, Type I) that have been measured and tested to provide a data base on surface finish, dimensional variation and ultimate tensile strength. Further plans call for developing and verifying the theoretical models by carefully designed experiments. This will be a joint effort between NASA and other prototyping centers to generate a larger database, thus encouraging more widespread usage by product designers.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

PubMed

Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

2010-02-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer

PubMed Central

Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.

2010-01-01

Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640

Identification of Genomic Regions Associated with Phenotypic Variation between Dog Breeds using Selection Mapping

PubMed Central

Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H.; Hansen, Mark S. T.; Lawley, Cindy T.; Karlsson, Elinor K.; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Åke; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T.

2011-01-01

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease. PMID:22022279
Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping.

PubMed

Vaysse, Amaury; Ratnakumar, Abhirami; Derrien, Thomas; Axelsson, Erik; Rosengren Pielberg, Gerli; Sigurdsson, Snaevar; Fall, Tove; Seppälä, Eija H; Hansen, Mark S T; Lawley, Cindy T; Karlsson, Elinor K; Bannasch, Danika; Vilà, Carles; Lohi, Hannes; Galibert, Francis; Fredholm, Merete; Häggström, Jens; Hedhammar, Ake; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Webster, Matthew T

2011-10-01

The extraordinary phenotypic diversity of dog breeds has been sculpted by a unique population history accompanied by selection for novel and desirable traits. Here we perform a comprehensive analysis using multiple test statistics to identify regions under selection in 509 dogs from 46 diverse breeds using a newly developed high-density genotyping array consisting of >170,000 evenly spaced SNPs. We first identify 44 genomic regions exhibiting extreme differentiation across multiple breeds. Genetic variation in these regions correlates with variation in several phenotypic traits that vary between breeds, and we identify novel associations with both morphological and behavioral traits. We next scan the genome for signatures of selective sweeps in single breeds, characterized by long regions of reduced heterozygosity and fixation of extended haplotypes. These scans identify hundreds of regions, including 22 blocks of homozygosity longer than one megabase in certain breeds. Candidate selection loci are strongly enriched for developmental genes. We chose one highly differentiated region, associated with body size and ear morphology, and characterized it using high-throughput sequencing to provide a list of variants that may directly affect these traits. This study provides a catalogue of genomic regions showing extreme reduction in genetic variation or population differentiation in dogs, including many linked to phenotypic variation. The many blocks of reduced haplotype diversity observed across the genome in dog breeds are the result of both selection and genetic drift, but extended blocks of homozygosity on a megabase scale appear to be best explained by selection. Further elucidation of the variants under selection will help to uncover the genetic basis of complex traits and disease.
Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine.

PubMed

Salleh, Mohd Zaki; Teh, Lay Kek; Lee, Lian Shien; Ismet, Rose Iszati; Patowary, Ashok; Joshi, Kandarp; Pasha, Ayesha; Ahmed, Azni Zain; Janor, Roziah Mohd; Hamzah, Ahmad Sazali; Adam, Aishah; Yusoff, Khalid; Hoh, Boon Peng; Hatta, Fazleen Haslinda Mohd; Ismail, Mohamad Izwan; Scaria, Vinod; Sivasubbu, Sridhar

2013-01-01

With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.
Genome-wide SNP analysis reveals a genetic basis for sea-age variation in a wild population of Atlantic salmon (Salmo salar).

PubMed

Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R

2014-07-01

Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.
Genomic variation and DNA repair associated with soybean transgenesis: a comparison to cultivars and mutagenized plants.

PubMed

Anderson, Justin E; Michno, Jean-Michel; Kono, Thomas J Y; Stec, Adrian O; Campbell, Benjamin W; Curtin, Shaun J; Stupar, Robert M

2016-05-12

The safety of mutagenized and genetically transformed plants remains a subject of scrutiny. Data gathered and communicated on the phenotypic and molecular variation induced by gene transfer technologies will provide a scientific-based means to rationally address such concerns. In this study, genomic structural variation (e.g. large deletions and duplications) and single nucleotide polymorphism rates were assessed among a sample of soybean cultivars, fast neutron-derived mutants, and five genetically transformed plants developed through Agrobacterium based transformation methods. On average, the number of genes affected by structural variations in transgenic plants was one order of magnitude less than that of fast neutron mutants and two orders of magnitude less than the rates observed between cultivars. Structural variants in transgenic plants, while rare, occurred adjacent to the transgenes, and at unlinked loci on different chromosomes. DNA repair junctions at both transgenic and unlinked sites were consistent with sequence microhomology across breakpoints. The single nucleotide substitution rates were modest in both fast neutron and transformed plants, exhibiting fewer than 100 substitutions genome-wide, while inter-cultivar comparisons identified over one-million single nucleotide polymorphisms. Overall, these patterns provide a fresh perspective on the genomic variation associated with high-energy induced mutagenesis and genetically transformed plants. The genetic transformation process infrequently results in novel genetic variation and these rare events are analogous to genetic variants occurring spontaneously, already present in the existing germplasm, or induced through other types of mutagenesis. It remains unclear how broadly these results can be applied to other crops or transformation methods.
Analysis of Copy Number Variation in the Abp Gene Regions of Two House Mouse Subspecies Suggests Divergence during the Gene Family Expansions

PubMed Central

Pezer, Željka; Chung, Amanda G.; Karn, Robert C.

2017-01-01

Abstract The Androgen-binding protein (Abp) gene region of the mouse genome contains 64 genes, some encoding pheromones that influence assortative mating between mice from different subspecies. Using CNVnator and quantitative PCR, we explored copy number variation in this gene family in natural populations of Mus musculus domesticus (Mmd) and Mus musculus musculus (Mmm), two subspecies of house mice that form a narrow hybrid zone in Central Europe. We found that copy number variation in the center of the Abp gene region is very common in wild Mmd, primarily representing the presence/absence of the final duplications described for the mouse genome. Clustering of Mmd individuals based on this variation did not reflect their geographical origin, suggesting no population divergence in the Abp gene cluster. However, copy number variation patterns differ substantially between Mmd and other mouse taxa. Large blocks of Abp genes are absent in Mmm, Mus musculus castaneus and an outgroup, Mus spretus, although with differences in variation and breakpoint locations. Our analysis calls into question the reliance on a reference genome for interpreting the detailed organization of genes in taxa more distant from the Mmd reference genome. The polymorphic nature of the gene family expansion in all four taxa suggests that the number of Abp genes, especially in the central gene region, is not critical to the survival and reproduction of the mouse. However, Abp haplotypes of variable length may serve as a source of raw genetic material for new signals influencing reproductive communication and thus speciation of mice. PMID:28575204
Membrane-containing virus particles exhibit the mechanics of a composite material for genome protection.

PubMed

Azinas, S; Bano, F; Torca, I; Bamford, D H; Schwartz, G A; Esnaola, J; Oksanen, H M; Richter, R P; Abrescia, N G

2018-04-26

The protection of the viral genome during extracellular transport is an absolute requirement for virus survival and replication. In addition to the almost universal proteinaceous capsids, certain viruses add a membrane layer that encloses their double-stranded (ds) DNA genome within the protein shell. Using the membrane-containing enterobacterial virus PRD1 as a prototype, and a combination of nanoindentation assays by atomic force microscopy and finite element modelling, we show that PRD1 provides a greater stability against mechanical stress than that achieved by the majority of dsDNA icosahedral viruses that lack a membrane. We propose that the combination of a stiff and brittle proteinaceous shell coupled with a soft and compliant membrane vesicle yields a tough composite nanomaterial well-suited to protect the viral DNA during extracellular transport.
Read clouds uncover variation in complex regions of the human genome.

PubMed

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-10-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Whole genome sequencing of Turkish genomes reveals functional private alleles and impact of genetic interactions with Europe, Asia and Africa.

PubMed

Alkan, Can; Kavak, Pinar; Somel, Mehmet; Gokcumen, Omer; Ugurlu, Serkan; Saygi, Ceren; Dal, Elif; Bugra, Kuyas; Güngör, Tunga; Sahinalp, S Cenk; Özören, Nesrin; Bekpen, Cemalettin

2014-11-07

Turkey is a crossroads of major population movements throughout history and has been a hotspot of cultural interactions. Several studies have investigated the complex population history of Turkey through a limited set of genetic markers. However, to date, there have been no studies to assess the genetic variation at the whole genome level using whole genome sequencing. Here, we present whole genome sequences of 16 Turkish individuals resequenced at high coverage (32×-48×). We show that the genetic variation of the contemporary Turkish population clusters with South European populations, as expected, but also shows signatures of relatively recent contribution from ancestral East Asian populations. In addition, we document a significant enrichment of non-synonymous private alleles, consistent with recent observations in European populations. A number of variants associated with skin color and total cholesterol levels show frequency differentiation between the Turkish populations and European populations. Furthermore, we have analyzed the 17q21.31 inversion polymorphism region (MAPT locus) and found increased allele frequency of 31.25% for H1/H2 inversion polymorphism when compared to European populations that show about 25% of allele frequency. This study provides the first map of common genetic variation from 16 western Asian individuals and thus helps fill an important geographical gap in analyzing natural human variation and human migration. Our data will help develop population-specific experimental designs for studies investigating disease associations and demographic history in Turkey.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size.

PubMed

Organ, Chris L; Brusatte, Stephen L; Stein, Koen

2009-12-22

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77-2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97-2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05-5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group.
Evaluation of genome-enabled selection for bacterial cold water disease resistance using progeny performance data in Rainbow Trout: Insights on genotyping methods and genomic prediction models

USDA-ARS?s Scientific Manuscript database

Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture, and traditional family-based breeding programs aimed at improving BCWD resistance have been limited to exploiting only between-family variation. We used genomic selection (GS) models to predict genomic br...
Completed Genome Sequences of Strains from 36 Serotypes of Salmonella

PubMed Central

Robertson, James; Yoshida, Catherine; Gurnik, Simone; Rankin, Marisa

2018-01-01

ABSTRACT We report here the completed closed genome sequences of strains representing 36 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation between serotypes, particularly as references for mapping of raw reads or to create assemblies of higher quality, as well as to aid in studies of comparative genomics of Salmonella. PMID:29348347
Genome expansion and gene loss in powdery mildew fungi reveal functional tradeoffs in extreme parasitism

USDA-ARS?s Scientific Manuscript database

Eukaryotic genomes vary in size over five orders of magnitude ranging from microsporidia (~2.9Mb) to the lung-fish (~1.2Tb). This extraordinary variation is largely a result of the proliferation of mobile DNA elements also referred to as “genomic parasites.” The constraints on genome size may be imp...
Genomic selection in plant breeding.

PubMed

Newell, Mark A; Jannink, Jean-Luc

2014-01-01

Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor and major marker effects. Thus, the GEBV may capture more of the genetic variation for the particular trait under selection.
Genome structure of bacillus cereus tsu1 and genes involved in cellulose degradation and poly-3-hydroxybutyrate synthesis

USDA-ARS?s Scientific Manuscript database

In previous work, we reported on the isolation and genome sequence analysis of Bacillus cereus strain tsu1 NCBI accession number JPYN00000000. The 36 scaffolds in the assembled tsu1 genome were all aligned with B. cereus B4264 genome with variations. Genes encoding for xylanase and cellulase and the...
Whole genome sequences in pulse crops: a global community resource to expedite translational genomics and knowledge-based crop improvement.

PubMed

Bohra, Abhishek; Singh, Narendra P

2015-08-01

Unprecedented developments in legume genomics over the last decade have resulted in the acquisition of a wide range of modern genomic resources to underpin genetic improvement of grain legumes. The genome enabled insights direct investigators in various ways that primarily include unearthing novel structural variations, retrieving the lost genetic diversity, introducing novel/exotic alleles from wider gene pools, finely resolving the complex quantitative traits and so forth. To this end, ready availability of cost-efficient and high-density genotyping assays allows genome wide prediction to be increasingly recognized as the key selection criterion in crop breeding. Further, the high-dimensional measurements of agronomically significant phenotypes obtained by using new-generation screening techniques will empower reference based resequencing as well as allele mining and trait mapping methods to comprehensively associate genome diversity with the phenome scale variation. Besides stimulating the forward genetic systems, accessibility to precisely delineated genomic segments reveals novel candidates for reverse genetic techniques like targeted genome editing. The shifting paradigm in plant genomics in turn necessitates optimization of crop breeding strategies to enable the most efficient integration of advanced omics knowledge and tools. We anticipate that the crop improvement schemes will be bolstered remarkably with rational deployment of these genome-guided approaches, ultimately resulting in expanded plant breeding capacities and improved crop performance.
Influence of genome and bio-ecology on the prevalence of genome exchange in unisexuals of the Ambystoma complex.

PubMed

Beauregard, France; Angers, Bernard

2018-05-31

Unisexuals of the blue-spotted salamander complex are thought to reproduce by kleptogenesis. Genome exchanges associated with this sperm-dependent mode of reproduction are expected to result in a higher genetic variation and multiple ploidy levels compared to clonality. However, the existence of some populations exclusively formed of genetically identical individuals suggests that factors could prevent genome exchanges. This study aimed at assessing the prevalence of genome exchange among unisexuals of the Ambystoma laterale-jeffersonianum complex from 10 sites in the northern part of their distribution. A total of 235 individuals, including 207 unisexuals, were genotyped using microsatellite loci and AFLP. Unisexual individuals could be sorted in five genetically distinct groups, likely derived from the same paternal A. jeffersonianum haplome. One of these groups exclusively reproduced clonally, even when found in sympatry with lineages presenting signature of genome exchange. Genome exchange was site-dependent for another group. Genome exchange was detected at all sites for the three remaining groups. Prevalence of genome exchange appears to be associated with ecological conditions such as availability of effective sperm donors. Intrinsic genomic factors may also affect this process, since different lineages in sympatry present highly variable rate of genome exchange. The coexistence of clonal and genetically diversified lineages opens the door to further research on alternatives to genetic variation.
Applications of the 1000 Genomes Project resources.

PubMed

Zheng-Bradley, Xiangqun; Flicek, Paul

2017-05-01

The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. © The Author 2016. Published by Oxford University Press.
Nucleotide diversity maps reveal variation in diversity among wheat genomes and chromosomes

USDA-ARS?s Scientific Manuscript database

Technical Abstract: 20-75 CHARACTER LINES A strategy for a genome-wide assessment of nucleotide diversity in a polyploid species must minimize the inclusion of homoeologous sequences into diversity estimates and reliably allocate individual haplotypes into respective genomes. In this study, nucle...
Phenotypic and genomic analyses of a fast neutron mutant population resource in soybean

USDA-ARS?s Scientific Manuscript database

Mutagenized populations have become indispensable resources for introducing variation and studying gene function in plant genomics research. In this study, fast neutron (FN) radiation was used to induce deletion mutations in the soybean (Glycine max (L.) Merrill) genome. Approximately 120,000 soybea...

Intraspecific variation in mitochondrial genome sequence, structure, and gene content in Silene vulgaris, an angiosperm with pervasive cytoplasmic male sterility.

PubMed

Sloan, Daniel B; Müller, Karel; McCauley, David E; Taylor, Douglas R; Storchová, Helena

2012-12-01

In angiosperms, mitochondrial-encoded genes can cause cytoplasmic male sterility (CMS), resulting in the coexistence of female and hermaphroditic individuals (gynodioecy). We compared four complete mitochondrial genomes from the gynodioecious species Silene vulgaris and found unprecedented amounts of intraspecific diversity for plant mitochondrial DNA (mtDNA). Remarkably, only about half of overall sequence content is shared between any pair of genomes. The four mtDNAs range in size from 361 to 429 kb and differ in gene complement, with rpl5 and rps13 being intact in some genomes but absent or pseudogenized in others. The genomes exhibit essentially no conservation of synteny and are highly repetitive, with evidence of reciprocal recombination occurring even across short repeats (< 250 bp). Some mitochondrial genes exhibit atypically high degrees of nucleotide polymorphism, while others are invariant. The genomes also contain a variable number of small autonomously mapping chromosomes, which have only recently been identified in angiosperm mtDNA. Southern blot analysis of one of these chromosomes indicated a complex in vivo structure consisting of both monomeric circles and multimeric forms. We conclude that S. vulgaris harbors an unusually large degree of variation in mtDNA sequence and structure and discuss the extent to which this variation might be related to CMS. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.
A population genomics approach shows widespread geographical distribution of cryptic genomic forms of the symbiotic fungus Rhizophagus irregularis.

PubMed

Savary, Romain; Masclaux, Frédéric G; Wyss, Tania; Droh, Germain; Cruz Corella, Joaquim; Machado, Ana Paula; Morton, Joseph B; Sanders, Ian R

2018-01-01

Arbuscular mycorrhizal fungi (AMF; phylum Gomeromycota) associate with plants forming one of the most successful microbe-plant associations. The fungi promote plant diversity and have a potentially important role in global agriculture. Plant growth depends on both inter- and intra-specific variation in AMF. It was recently reported that an unusually large number of AMF taxa have an intercontinental distribution, suggesting long-distance gene flow for many AMF species, facilitated by either long-distance natural dispersal mechanisms or human-assisted dispersal. However, the intercontinental distribution of AMF species has been questioned because the use of very low-resolution markers may be unsuitable to detect genetic differences among geographically separated AMF, as seen with some other fungi. This has been untestable because of the lack of population genomic data, with high resolution, for any AMF taxa. Here we use phylogenetics and population genomics to test for intra-specific variation in Rhizophagus irregularis, an AMF species for which genome sequence information already exists. We used ddRAD sequencing to obtain thousands of markers distributed across the genomes of 81 R. irregularis isolates and related species. Based on 6 888 variable positions, we observed significant genetic divergence into four main genetic groups within R. irregularis, highlighting that previous studies have not captured underlying genetic variation. Despite considerable genetic divergence, surprisingly, the variation could not be explained by geographical origin, thus also supporting the hypothesis for at least one AMF species of widely dispersed AMF genotypes at an intercontinental scale. Such information is crucial for understanding AMF ecology, and how these fungi can be used in an environmentally safe way in distant locations.
Distribution and diversity of cytotypes in Dianthus broteri as evidenced by genome size variations.

PubMed

Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador

2009-10-01

Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1.17-fold) and tetraploid (1.22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones.
Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster

PubMed Central

Song, Yun S.

2012-01-01

Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity. PMID:23284288
Genome-Wide Sequence Variation Identification and Floral-Associated Trait Comparisons Based on the Re-sequencing of the ‘Nagafu No. 2’ and ‘Qinguan’ Varieties of Apple (Malus domestica Borkh.)

PubMed Central

Xing, Libo; Zhang, Dong; Song, Xiaomin; Weng, Kai; Shen, Yawen; Li, Youmei; Zhao, Caiping; Ma, Juanjuan; An, Na; Han, Mingyu

2016-01-01

Apple (Malus domestica Borkh.) is a commercially important fruit worldwide. Detailed information on genomic DNA polymorphisms, which are important for understanding phenotypic traits, is lacking for the apple. We re-sequenced two elite apple varieties, ‘Nagafu No. 2’ and ‘Qinguan,’ which have different characteristics. We identified many genomic variations, including 2,771,129 single nucleotide polymorphisms (SNPs), 82,663 structural variations (SVs), and 1,572,803 insertion/deletions (INDELs) in ‘Nagafu No. 2’ and 2,262,888 SNPs, 63,764 SVs, and 1,294,060 INDELs in ‘Qinguan.’ The ‘SNP,’ ‘INDEL,’ and ‘SV’ distributions were non-random, with variation-rich or -poor regions throughout the genomes. In ‘Nagafu No. 2’ and ‘Qinguan’ there were 171,520 and 147,090 non-synonymous SNPs spanning 23,111 and 21,400 genes, respectively; 3,963 and 3,196 SVs in 3,431 and 2,815 genes, respectively; and 1,834 and 1,451 INDELs in 1,681 and 1,345 genes, respectively. Genetic linkage maps of 190 flowering genes associated with multiple flowering pathways in ‘Nagafu No. 2,’ ‘Qinguan,’ and ‘Golden Delicious,’ identified complex regulatory mechanisms involved in floral induction, flower bud formation, and flowering characteristics, which might reflect the genetic variation of the flowering genes. Expression profiling of key flowering genes in buds and leaves suggested that the photoperiod and autonomous flowering pathways are major contributors to the different floral-associated traits between ‘Nagafu No. 2’ and ‘Qinguan.’ The genome variation data provided a foundation for the further exploration of apple diversity and gene–phenotype relationships, and for future research on molecular breeding to improve apple and related species. PMID:27446138
To peep into Pif1 helicase: multifaceted all the way from genome stability to repair-associated DNA synthesis.

PubMed

Chung, Woo-Hyun

2014-02-01

Pif1 DNA helicase is the prototypical member of a 5' to 3' helicase superfamily conserved from bacteria to humans. In Saccharomyces cerevisiae, Pif1 and its homologue Rrm3, localize in both mitochondria and nucleus playing multiple roles in the maintenance of genomic homeostasis. They display relatively weak processivities in vitro, but have largely non-overlapping functions on common genomic loci such as mitochondrial DNA, telomeric ends, and many replication forks especially at hard-to-replicate regions including ribosomal DNA and G-quadruplex structures. Recently, emerging evidence shows that Pif1, but not Rrm3, has a significant new role in repair-associated DNA synthesis with Polδ during homologous recombination stimulating D-loop migration for conservative DNA replication. Comparative genetic and biochemical studies on the structure and function of Pif1 family helicases across different biological systems are further needed to elucidate both diversity and specificity of their mechanisms of action that contribute to genome stability.
Mutation Rates across Budding Yeast Chromosome VI Are Correlated with Replication Timing

PubMed Central

Lang, Gregory I.; Murray, Andrew W.

2011-01-01

Previous experimental studies suggest that the mutation rate is nonuniform across the yeast genome. To characterize this variation across the genome more precisely, we measured the mutation rate of the URA3 gene integrated at 43 different locations tiled across Chromosome VI. We show that mutation rate varies 6-fold across a single chromosome, that this variation is correlated with replication timing, and we propose a model to explain this variation that relies on the temporal separation of two processes for replicating past damaged DNA: error-free DNA damage tolerance and translesion synthesis. This model is supported by the observation that eliminating translesion synthesis decreases this variation. PMID:21666225
Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

PubMed

Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

2011-03-04

Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Development of a system to measure local measurement conditions around textile electrodes.

PubMed

Kim, Saim; Oliveira, Joana; Roethlingshoefer, Lisa; Leonhard, Steffen

2010-01-01

The three main influence factors on the interface between textile electrode an skin are: temperature, contact pressure and relative humidity. This paper presents first results of a prototype, which measures these local measurement conditions around textile electrodes. The wearable prototype is a data acquisition system based on a microcontroller with a flexible sensor sleeve. Validation measurements included variation of ambient temperature, contact pressures and sleeve material. Results show a good correlation with data found in literature.
Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping

PubMed Central

Busby, Ben; Lesko, Matthew; Federer, Lisa

2016-01-01

In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon’s conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team. PMID:27134733
Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions.

PubMed

Sohn, Sunghwan; Wang, Yanshan; Wi, Chung-Il; Krusemark, Elizabeth A; Ryu, Euijung; Ali, Mir H; Juhn, Young J; Liu, Hongfang

2017-11-30

To assess clinical documentation variations across health care institutions using different electronic medical record systems and investigate how they affect natural language processing (NLP) system portability. Birth cohorts from Mayo Clinic and Sanford Children's Hospital (SCH) were used in this study (n = 298 for each). Documentation variations regarding asthma between the 2 cohorts were examined in various aspects: (1) overall corpus at the word level (ie, lexical variation), (2) topics and asthma-related concepts (ie, semantic variation), and (3) clinical note types (ie, process variation). We compared those statistics and explored NLP system portability for asthma ascertainment in 2 stages: prototype and refinement. There exist notable lexical variations (word-level similarity = 0.669) and process variations (differences in major note types containing asthma-related concepts). However, semantic-level corpora were relatively homogeneous (topic similarity = 0.944, asthma-related concept similarity = 0.971). The NLP system for asthma ascertainment had an F-score of 0.937 at Mayo, and produced 0.813 (prototype) and 0.908 (refinement) when applied at SCH. The criteria for asthma ascertainment are largely dependent on asthma-related concepts. Therefore, we believe that semantic similarity is important to estimate NLP system portability. As the Mayo Clinic and SCH corpora were relatively homogeneous at a semantic level, the NLP system, developed at Mayo Clinic, was imported to SCH successfully with proper adjustments to deal with the intrinsic corpus heterogeneity. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
GENOMIC BASIS OF AGING AND LIFE HISTORY EVOLUTION IN DROSOPHILA MELANOGASTER

PubMed Central

Remolina, Silvia C.; Chang, Peter L.; Leips, Jeff; Nuzhdin, Sergey V.; Hughes, Kimberly A.

2015-01-01

Natural diversity in aging and other life history patterns is a hallmark of organismal variation. Related species, populations, and individuals within populations show genetically based variation in life span and other aspects of age-related performance. Population differences are especially informative because these differences can be large relative to within-population variation and because they occur in organisms with otherwise similar genomes. We used experimental evolution to produce populations divergent for life span and late-age fertility and then used deep genome sequencing to detect sequence variants with nucleotide-level resolution. Several genes and genome regions showed strong signatures of selection, and the same regions were implicated in independent comparisons, suggesting that the same alleles were selected in replicate lines. Genes related to oogenesis, immunity, and protein degradation were implicated as important modifiers of late-life performance. Expression profiling and functional annotation narrowed the list of strong candidate genes to 38, most of which are novel candidates for regulating aging. Life span and early-age fecundity were negatively correlated among populations; therefore the alleles we identified also are candidate regulators of a major life-history trade-off. More generally, we argue that hitchhiking mapping can be a powerful tool for uncovering the molecular bases of quantitative genetic variation. PMID:23106705
Expanding probe repertoire and improving reproducibility in human genomic hybridization

PubMed Central

Dorman, Stephanie N.; Shirley, Ben C.; Knoll, Joan H. M.; Rogan, Peter K.

2013-01-01

Diagnostic DNA hybridization relies on probes composed of single copy (sc) genomic sequences. Sc sequences in probe design ensure high specificity and avoid cross-hybridization to other regions of the genome, which could lead to ambiguous results that are difficult to interpret. We examine how the distribution and composition of repetitive sequences in the genome affects sc probe performance. A divide and conquer algorithm was implemented to design sc probes. With this approach, sc probes can include divergent repetitive elements, which hybridize to unique genomic targets under higher stringency experimental conditions. Genome-wide custom probe sets were created for fluorescent in situ hybridization (FISH) and microarray genomic hybridization. The scFISH probes were developed for detection of copy number changes within small tumour suppressor genes and oncogenes. The microarrays demonstrated increased reproducibility by eliminating cross-hybridization to repetitive sequences adjacent to probe targets. The genome-wide microarrays exhibited lower median coefficients of variation (17.8%) for two HapMap family trios. The coefficients of variations of commercial probes within 300 nt of a repetitive element were 48.3% higher than the nearest custom probe. Furthermore, the custom microarray called a chromosome 15q11.2q13 deletion more consistently. This method for sc probe design increases probe coverage for FISH and lowers variability in genomic microarrays. PMID:23376933
Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae)

PubMed Central

Alverson, Andrew J.; Wei, XiaoXin; Rice, Danny W.; Stern, David B.; Barry, Kerrie; Palmer, Jeffrey D.

2010-01-01

The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. PMID:20118192
Human genetics: international projects and personalized medicine.

PubMed

Apellaniz-Ruiz, Maria; Gallego, Cristina; Ruiz-Pinto, Sara; Carracedo, Angel; Rodríguez-Antona, Cristina

2016-03-01

In this article, we present the progress driven by the recent technological advances and new revolutionary massive sequencing technologies in the field of human genetics. We discuss this knowledge in relation with drug response prediction, from the germline genetic variation compiled in the 1000 Genomes Project or in the Genotype-Tissue Expression project, to the phenome-genome archives, the international cancer projects, such as The Cancer Genome Atlas or the International Cancer Genome Consortium, and the epigenetic variation and its influence in gene expression, including the regulation of drug metabolism. This review is based on the lectures presented by the speakers of the Symposium "Human Genetics: International Projects & New Technologies" from the VII Conference of the Spanish Pharmacogenetics and Pharmacogenomics Society, held on the 20th and 21st of April 2015.
The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line.

PubMed

Adey, Andrew; Burton, Joshua N; Kitzman, Jacob O; Hiatt, Joseph B; Lewis, Alexandra P; Martin, Beth K; Qiu, Ruolan; Lee, Choli; Shendure, Jay

2013-08-08

The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro. The robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption--both intentionally and through widespread cross-contamination--and for the past 60 years it has served a role analogous to that of a model organism. The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype, partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq and ENCODE Project data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500 kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.
The Rules of Variation Expanded, Implications for the Research on Compatible Genomics.

PubMed

Castro-Chavez, Fernando

2011-05-12

The main focus of this article is to present the practical aspect of the code rules of variation and the search for a second set of genomic rules, including comparison of sequences to understand how to preserve compatible organisms in danger of extinction and how to generate biodiversity. Three new rules of variation are introduced: 1) homologous recombination, 2) a healthy fertile offspring, and 3) comparison of compatible genomes. The novel search in the natural world for fully compatible genomes capable of homologous recombination is explored by using examples of human polymorphisms in the LDLRAP1 gene, and by the production of fertile offspring by crossbreeding. Examples of dogs, llamas and finches will be presented by a rational control of: natural crossbreeding of organisms with compatible genomes (something already happening in nature), the current work focuses on the generation of new varieties after a careful plan. This study is presented within the context of biosemiotics, which studies the processing of information, signaling and signs by living systems. I define a group of organisms having compatible genomes as a single theme: the genomic species or population, able to speak the same molecular language through different accents, with each variety within a theme being a different version of the same book. These studies have a molecular, compatible genetics context. Population and ecosystem biosemiotics will be exemplified by a possible genetic damage capable of causing mutations by breaking the rules of variation through the coordinated patterns of atoms present in the 9/11 World Trade Center contaminated dust (U, Ba, La, Ce, Sr, Rb, K, Mn, Mg, etc.), combination that may be able to overload the molecular quality control mechanisms of the human body. I introduce here the balance of codons in the circular genetic code: 2[1(1)+1(3)+1(4)+4(2)]=2[2(2)+3(4)].
The Rules of Variation Expanded, Implications for the Research on Compatible Genomics

PubMed Central

Castro-Chavez, Fernando

2011-01-01

The main focus of this article is to present the practical aspect of the code rules of variation and the search for a second set of genomic rules, including comparison of sequences to understand how to preserve compatible organisms in danger of extinction and how to generate biodiversity. Three new rules of variation are introduced: 1) homologous recombination, 2) a healthy fertile offspring, and 3) comparison of compatible genomes. The novel search in the natural world for fully compatible genomes capable of homologous recombination is explored by using examples of human polymorphisms in the LDLRAP1 gene, and by the production of fertile offspring by crossbreeding. Examples of dogs, llamas and finches will be presented by a rational control of: natural crossbreeding of organisms with compatible genomes (something already happening in nature), the current work focuses on the generation of new varieties after a careful plan. This study is presented within the context of biosemiotics, which studies the processing of information, signaling and signs by living systems. I define a group of organisms having compatible genomes as a single theme: the genomic species or population, able to speak the same molecular language through different accents, with each variety within a theme being a different version of the same book. These studies have a molecular, compatible genetics context. Population and ecosystem biosemiotics will be exemplified by a possible genetic damage capable of causing mutations by breaking the rules of variation through the coordinated patterns of atoms present in the 9/11 World Trade Center contaminated dust (U, Ba, La, Ce, Sr, Rb, K, Mn, Mg, etc.), combination that may be able to overload the molecular quality control mechanisms of the human body. I introduce here the balance of codons in the circular genetic code: 2[1(1)+1(3)+1(4)+4(2)]=2[2(2)+3(4)]. PMID:21743816
Efficient infectious cell culture systems of the hepatitis C virus (HCV) prototype strains HCV-1 and H77.

PubMed

Li, Yi-Ping; Ramirez, Santseharay; Mikkelsen, Lotte; Bukh, Jens

2015-01-01

The first discovered and sequenced hepatitis C virus (HCV) genome and the first in vivo infectious HCV clones originated from the HCV prototype strains HCV-1 and H77, respectively, both widely used in research of this important human pathogen. In the present study, we developed efficient infectious cell culture systems for these genotype 1a strains by using the HCV-1/SF9_A and H77C in vivo infectious clones. We initially adapted a genome with the HCV-1 5'UTR-NS5A (where UTR stands for untranslated region) and the JFH1 NS5B-3'UTR (5-5A recombinant), including the genotype 2a-derived mutations F1464L/A1672S/D2979G (LSG), to grow efficiently in Huh7.5 cells, thus identifying the E2 mutation S399F. The combination of LSG/S399F and reported TNcc(1a)-adaptive mutations A1226G/Q1773H/N1927T/Y2981F/F2994S promoted adaptation of the full-length HCV-1 clone. An HCV-1 recombinant with 17 mutations (HCV1cc) replicated efficiently in Huh7.5 cells and produced supernatant infectivity titers of 10(4.0) focus-forming units (FFU)/ml. Eight of these mutations were identified from passaged HCV-1 viruses, and the A970T/I1312V/C2419R/A2919T mutations were essential for infectious particle production. Using CD81-deficient Huh7 cells, we further demonstrated the importance of A970T/I1312V/A2919T or A970T/C2419R/A2919T for virus assembly and that the I1312V/C2419R combination played a major role in virus release. Using a similar approach, we found that NS5B mutation F2994R, identified here from culture-adapted full-length TN viruses and a common NS3 helicase mutation (S1368P) derived from viable H77C and HCV-1 5-5A recombinants, initiated replication and culture adaptation of H77C containing LSG and TNcc(1a)-adaptive mutations. An H77C recombinant harboring 19 mutations (H77Ccc) replicated and spread efficiently after transfection and subsequent infection of naive Huh7.5 cells, reaching titers of 10(3.5) and 10(4.4) FFU/ml, respectively. Hepatitis C virus (HCV) was discovered in 1989 with the cloning of the prototype strain HCV-1 genome. In 1997, two molecular clones of H77, the other HCV prototype strain, were shown to be infectious in chimpanzees, but not in vitro. HCV research was hampered by a lack of infectious cell culture systems, which became available only in 2005 with the discovery of JFH1 (genotype 2a), a genome that could establish infection in Huh7.5 cells. Recently, we developed in vitro infectious clones for genotype 1a (TN), 2a (J6), and 2b (J8, DH8, and DH10) strains by identifying key adaptive mutations. Globally, genotype 1 is the most prevalent. Studies using HCV-1 and H77 prototype sequences have generated important knowledge on HCV. Thus, the in vitro infectious clones developed here for these 1a strains will be of particular value in advancing HCV research. Moreover, our findings open new avenues for the culture adaptation of HCV isolates of different genotypes. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Comparative population genomics of latitudinal variation in Drosophila simulans and Drosophila melanogaster

PubMed Central

MACHADO, HEATHER E.; BERGLAND, ALAN O.; O’BRIEN, KATHERINE R.; BEHRMAN, EMILY L.; SCHMIDT, PAUL S.; PETROV, DMITRI A.

2016-01-01

Examples of clinal variation in phenotypes and genotypes across latitudinal transects have served as important models for understanding how spatially varying selection and demographic forces shape variation within species. Here, we examine the selective and demographic contributions to latitudinal variation through the largest comparative genomic study to date of Drosophila simulans and Drosophila melanogaster, with genomic sequence data from 382 individual fruit flies, collected across a spatial transect of 19 degrees latitude and at multiple time points over 2 years. Consistent with phenotypic studies, we find less clinal variation in D. simulans than D. melanogaster, particularly for the autosomes. Moreover, we find that clinally varying loci in D. simulans are less stable over multiple years than comparable clines in D. melanogaster. D. simulans shows a significantly weaker pattern of isolation by distance than D. melanogaster and we find evidence for a stronger contribution of migration to D. simulans population genetic structure. While population bottlenecks and migration can plausibly explain the differences in stability of clinal variation between the two species, we also observe a significant enrichment of shared clinal genes, suggesting that the selective forces associated with climate are acting on the same genes and phenotypes in D. simulans and D. melanogaster. PMID:26523848

Genome size expansion and the relationship between nuclear DNA content and spore size in the Asplenium monanthes fern complex (Aspleniaceae)

PubMed Central

2013-01-01

Background Homosporous ferns are distinctive amongst the land plant lineages for their high chromosome numbers and enigmatic genomes. Genome size measurements are an under exploited tool in homosporous ferns and show great potential to provide an overview of the mechanisms that define genome evolution in these ferns. The aim of this study is to investigate the evolution of genome size and the relationship between genome size and spore size within the apomictic Asplenium monanthes fern complex and related lineages. Results Comparative analyses to test for a relationship between spore size and genome size show that they are not correlated. The data do however provide evidence for marked genome size variation between species in this group. These results indicate that Asplenium monanthes has undergone a two-fold expansion in genome size. Conclusions Our findings challenge the widely held assumption that spore size can be used to infer ploidy levels within apomictic fern complexes. We argue that the observed genome size variation is likely to have arisen via increases in both chromosome number due to polyploidy and chromosome size due to amplification of repetitive DNA (e.g. transposable elements, especially retrotransposons). However, to date the latter has not been considered to be an important process of genome evolution within homosporous ferns. We infer that genome evolution, at least in some homosporous fern lineages, is a more dynamic process than existing studies would suggest. PMID:24354467
Using Single-Nucleotide Polymorphisms To Discriminate Disease-Associated from Carried Genomes of Neisseria meningitidis▿†

PubMed Central

Katz, Lee S.; Sharma, Nitya V.; Harcourt, Brian H.; Thomas, Jennifer Dolan; Wang, Xin; Mayer, Leonard W.; Jordan, I. King

2011-01-01

Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category “symbiosis, encompassing mutualism through parasitism.” PMID:21622743
[Annotation of the mobilomes of nine teleost species].

PubMed

Gao, Bo; Shen, Dan; Chen, Cai; Wang, Saisai; Yang, Kunlun; Chen, Wei; Wang, Wei; Zhang, Li; Song, Chengyi

2018-01-25

In this study, the mobilomes of nine teleost species were annotated by bioinformatics methods. Both of the mobilome size and constitute displayed a significant difference in 9 species of teleost fishes. The species of mobilome content ranking from high to low were zebrafish, medaka, tilapia, coelacanth, platyfish, cod, stickleback, tetradon and fugu. Mobilome content and genome size were positively correlated. The DNA transposons displayed higher diversity and larger variation in teleost (0.50% to 38.37%), was a major determinant of differences in teleost mobilomes, and hAT and Tc/Mariner superfamily were the major DNA transposons in teleost. RNA transposons also exhibited high diversity in teleost, LINE transposons accounted for 0.53% to 5.75% teleost genomic sequences, and 14 superfamilies were detected. L1, L2, RTE and Rex retrotransposons obtained significant amplification. While LTR displayed low amplification in most teleost with less than 2% of genome coverages, except in zebrafish and stickleback, where LTR reachs 5.58% and 2.51% of genome coverages respectively. And 6 LTR superfamilies (Copia, DIRS, ERV, Gypsy, Ngaro and Pao) were detected in the teleost, and Gypsy exhibits obvious amplication among them. While the SINE represents the weakest ampification types in teleost, only within zebrafish and coelacanth, it represents 3.28% and 5.64% of genome coverages, in the other 7 teleost, it occupies less than 1% of genomes, and tRNA, 5S and MIR families of SINE have a certain degree of amplification in some teleosts. This study shows that the teleost display high diversity and large variation of mobilome, there is a strong correlation with the size variations of genomes and mobilome contents in teleost, mobilome is an important factor in determining the teleost genome size.
Cryptosporidium as a testbed for single cell genome characterization of unicellular eukaryotes.

PubMed

Troell, Karin; Hallström, Björn; Divne, Anna-Maria; Alsmark, Cecilia; Arrighi, Romanico; Huss, Mikael; Beser, Jessica; Bertilsson, Stefan

2016-06-23

Infectious disease involving multiple genetically distinct populations of pathogens is frequently concurrent, but difficult to detect or describe with current routine methodology. Cryptosporidium sp. is a widespread gastrointestinal protozoan of global significance in both animals and humans. It cannot be easily maintained in culture and infections of multiple strains have been reported. To explore the potential use of single cell genomics methodology for revealing genome-level variation in clinical samples from Cryptosporidium-infected hosts, we sorted individual oocysts for subsequent genome amplification and full-genome sequencing. Cells were identified with fluorescent antibodies with an 80 % success rate for the entire single cell genomics workflow, demonstrating that the methodology can be applied directly to purified fecal samples. Ten amplified genomes from sorted single cells were selected for genome sequencing and compared both to the original population and a reference genome in order to evaluate the accuracy and performance of the method. Single cell genome coverage was on average 81 % even with a moderate sequencing effort and by combining the 10 single cell genomes, the full genome was accounted for. By a comparison to the original sample, biological variation could be distinguished and separated from noise introduced in the amplification. As a proof of principle, we have demonstrated the power of applying single cell genomics to dissect infectious disease caused by closely related parasite species or subtypes. The workflow can easily be expanded and adapted to target other protozoans, and potential applications include mapping genome-encoded traits, virulence, pathogenicity, host specificity and resistance at the level of cells as truly meaningful biological units.
Population genomics and the causes of local differentiation.

PubMed

Tonsor, Stephen J

2012-11-01

Exactly 50 years ago, a revolution in empirical population genetics began with the introduction of methods for detecting allelic variation using protein electrophoresis (Throckmorton 1962; Hubby 1963; Lewontin & Hubby 1966). These pioneering scientists showed that populations are chock-full of genetic variation. This variation was a surprise that required a re-thinking of evolutionary genetic heuristics. Understanding the causes for the maintenance of this variation became and remains a major area of research. In the process of addressing the causes, this same group of scientists documented geographical genetic structure (Prakash et al. 1969), spawning the continued accumulation of what is now a huge case study catalogue of geographical differentiation (e.g. Loveless & Hamrick 1984; Linhart & Grant 1996). Geographical differentiation is clearly quite common. Yet, a truly general understanding of the patterns in and causes of spatial genetic structure across the genome remains elusive. To what extent is spatial structure driven by drift and phylogeography vs. geographical differences in environmental sources of selection? What proportion of the genome participates? A general understanding requires range-wide data on spatial patterning of variation across the entire genome. In this issue of Molecular Ecology, Lasky et al. (2012) make important strides towards addressing these issues, taking advantage of three contemporary revolutions in evolutionary biology. Two are technological: high-throughput sequencing and burgeoning computational power. One is cultural: open access to data from the community of scientists and especially data sets that result from large collaborative efforts. Together, these developments may at last put answers within reach.
Whole-genome analysis of a patient with early-stage small-cell lung cancer.

PubMed

Han, J-Y; Lee, Y-S; Kim, B C; Lee, G K; Lee, S; Kim, E-H; Kim, H-M; Bhak, J

2014-12-01

We performed whole-genome sequencing (WGS) of a case of early-stage small-cell lung cancer (SCLC) to analyze the genomic features. WGS revealed a lot of single-nucleotide variations (SNVs), small insertion/deletions and chromosomal abnormality. Chromosomes 4p, 5q, 13q, 15q, 17p and 22q contained many block deletions. Especially, copy loss was observed in tumor suppressor genes RB1 and TP53, and copy gain in oncogene hTERT. Somatic mutations were found in TP53 and CREBBP. Novel nonsynonymous (ns) SNVs in C6ORF103 and SLC5A4 genes were also found. Sanger sequencing of the SLC5A4 gene in 23 independent SCLC samples showed another nsSNV in the SLC5A4 gene, indicating that nsSNVs in the SLC5A4 gene are recurrent in SCLC. WGS of an early-stage SCLC identified novel recurrent mutations and validated known variations, including copy number variations. These findings provide insight into the genomic landscape contributing to SCLC development.
The Diversity Present in 5140 Human Mitochondrial Genomes

PubMed Central

Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.

2009-01-01

We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
Recombination rate variation in mice from an isolated island

PubMed Central

Wang, Richard J.; Gray, Melissa M.; Parmenter, Michelle D.; Broman, Karl W.; Payseur, Bret A.

2016-01-01

Recombination rate is a heritable trait that varies among individuals. Despite the major impact of recombination rate on patterns of genetic diversity and the efficacy of selection, natural variation in this phenotype remains poorly characterized. We present a comparison of genetic maps, sampling 1,212 meioses, from a unique population of wild house mice (Mus musculus domesticus) that recently colonized remote Gough Island. Crosses to a mainland reference strain (WSB/EiJ) reveal pervasive variation in recombination rate among Gough Island mice, including sub-chromosomal intervals spanning up to 28% of the genome. In spite of this high level of polymorphism, the genome-wide recombination rate does not significantly vary. In general, we find that recombination rate varies more when measured in smaller genomic intervals. Using the current standard genetic map of the laboratory mouse to polarize intervals with divergent recombination rates, we infer that the majority of evolutionary change occurred in one of the two tested lines of Gough Island mice. Our results confirm that natural populations harbor a high level of recombination rate polymorphism and highlight the disparities in recombination rate evolution across genomic scales. PMID:27864900
Genome-wide DNA methylation alterations of Alternanthera philoxeroides in natural and manipulated habitats: implications for epigenetic regulation of rapid responses to environmental fluctuation and phenotypic variation.

PubMed

Gao, Lexuan; Geng, Yupeng; Li, Bo; Chen, Jiakuan; Yang, Ji

2010-11-01

Alternanthera philoxeroides (alligator weed) is an invasive weed that can colonize both aquatic and terrestrial habitats. Individuals growing in different habitats exhibit extensive phenotypic variation but little genetic differentiation in its introduced range. The mechanisms underpinning the wide range of phenotypic variation and rapid adaptation to novel and changing environments remain uncharacterized. In this study, we examined the epigenetic variation and its correlation with phenotypic variation in plants exposed to natural and manipulated environmental variability. Genome-wide methylation profiling using methylation-sensitive amplified fragment length polymorphism (MSAP) revealed considerable DNA methylation polymorphisms within and between natural populations. Plants of different source populations not only underwent significant morphological changes in common garden environments, but also underwent a genome-wide epigenetic reprogramming in response to different treatments. Methylation alterations associated with response to different water availability were detected in 78.2% (169/216) of common garden induced polymorphic sites, demonstrating the environmental sensitivity and flexibility of the epigenetic regulatory system. These data provide evidence of the correlation between epigenetic reprogramming and the reversible phenotypic response of alligator weed to particular environmental factors. © 2010 Blackwell Publishing Ltd.
Genome-wide DNA methylation map of human neutrophils reveals widespread inter-individual epigenetic variation

PubMed Central

Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Duncan, Elizabeth J.; Parry, Matthew F.; Weeks, Robert J.; Morison, Ian M.

2015-01-01

The extent of variation in DNA methylation patterns in healthy individuals is not yet well documented. Identification of inter-individual epigenetic variation is important for understanding phenotypic variation and disease susceptibility. Using neutrophils from a cohort of healthy individuals, we generated base-resolution DNA methylation maps to document inter-individual epigenetic variation. We identified 12851 autosomal inter-individual variably methylated fragments (iVMFs). Gene promoters were the least variable, whereas gene body and upstream regions showed higher variation in DNA methylation. The iVMFs were relatively enriched in repetitive elements compared to non-iVMFs, and were associated with genome regulation and chromatin function elements. Further, variably methylated genes were disproportionately associated with regulation of transcription, responsive function and signal transduction pathways. Transcriptome analysis indicates that iVMF methylation at differentially expressed exons has a positive correlation and local effect on the inclusion of that exon in the mRNA transcript. PMID:26612583
TUMOR HAPLOTYPE ASSEMBLY ALGORITHMS FOR CANCER GENOMICS

PubMed Central

AGUIAR, DEREK; WONG, WENDY S.W.; ISTRAIL, SORIN

2014-01-01

The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual tumor populations is an exceedingly difficult task complicated by tumor haplotype heterogeneity, tumor or normal cell sequence contamination, polyploidy, and complex patterns of variation. While computational and experimental haplotype phasing of diploid genomes has seen much progress in recent years, haplotype assembly in cancer genomes remains uncharted territory. In this work, we describe HapCompass-Tumor a computational modeling and algorithmic framework for haplotype assembly of copy number variable cancer genomes containing haplotypes at different frequencies and complex variation. We extend our polyploid haplotype assembly model and present novel algorithms for (1) complex variations, including copy number changes, as varying numbers of disjoint paths in an associated graph, (2) variable haplotype frequencies and contamination, and (3) computation of tumor haplotypes using simple cycles of the compass graph which constrain the space of haplotype assembly solutions. The model and algorithm are implemented in the software package HapCompass-Tumor which is available for download from http://www.brown.edu/Research/Istrail_Lab/. PMID:24297529
Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates.

PubMed

Heunis, Tiaan; Dippenaar, Anzaan; Warren, Robin M; van Helden, Paul D; van der Merwe, Ruben G; Gey van Pittius, Nicolaas C; Pain, Arnab; Sampson, Samantha L; Tabb, David L

2017-10-06

Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.
Comparative genomic analysis of Helicobacter pylori from Malaysia identifies three distinct lineages suggestive of differential evolution

PubMed Central

Kumar, Narender; Mariappan, Vanitha; Baddam, Ramani; Lankapalli, Aditya K.; Shaik, Sabiha; Goh, Khean-Lee; Loke, Mun Fai; Perkins, Tim; Benghezal, Mohammed; Hasnain, Seyed E.; Vadivelu, Jamuna; Marshall, Barry J.; Ahmed, Niyaz

2015-01-01

The discordant prevalence of Helicobacter pylori and its related diseases, for a long time, fostered certain enigmatic situations observed in the countries of the southern world. Variation in H. pylori infection rates and disease outcomes among different populations in multi-ethnic Malaysia provides a unique opportunity to understand dynamics of host–pathogen interaction and genome evolution. In this study, we extensively analyzed and compared genomes of 27 Malaysian H. pylori isolates and identified three major phylogeographic lineages: hspEastAsia, hpEurope and hpSouthIndia. The analysis of the virulence genes within the core genome, however, revealed a comparable pathogenic potential of the strains. In addition, we identified four genes limited to strains of East-Asian lineage. Our analyses identified a few strain-specific genes encoding restriction modification systems and outlined 311 core genes possibly under differential evolutionary constraints, among the strains representing different ethnic groups. The cagA and vacA genes also showed variations in accordance with the host genetic background of the strains. Moreover, restriction modification genes were found to be significantly enriched in East-Asian strains. An understanding of these variations in the genome content would provide significant insights into various adaptive and host modulation strategies harnessed by H. pylori to effectively persist in a host-specific manner. PMID:25452339
Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

PubMed

Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

2016-08-30

Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.
Human structural variation: mechanisms of chromosome rearrangements

PubMed Central

Weckselblatt, Brooke; Rudd, M. Katharine

2015-01-01

Chromosome structural variation (SV) is a normal part of variation in the human genome, but some classes of SV can cause neurodevelopmental disorders. Analysis of the DNA sequence at SV breakpoints can reveal mutational mechanisms and risk factors for chromosome rearrangement. Large-scale SV breakpoint studies have become possible recently owing to advances in next-generation sequencing (NGS) including whole-genome sequencing (WGS). These findings have shed light on complex forms of SV such as triplications, inverted duplications, insertional translocations, and chromothripsis. Sequence-level breakpoint data resolve SV structure and determine how genes are disrupted, fused, and/or misregulated by breakpoints. Recent improvements in breakpoint sequencing have also revealed non-allelic homologous recombination (NAHR) between paralogous long interspersed nuclear element (LINE) or human endogenous retrovirus (HERV) repeats as a cause of deletions, duplications, and translocations. This review covers the genomic organization of simple and complex constitutional SVs, as well as the molecular mechanisms of their formation. PMID:26209074
Variation in recombination rate may bias human genetic disease mapping studies.

PubMed

Boyle, A Susannah; Noor, Mohamed A F

2004-11-01

The availability of the human genome sequence and variability information (as from the International HapMap project) will enhance our ability to map genetic disorders and choose targets for therapeutic intervention. However, several factors, such as regional variation in recombination rate, can bias conclusions from genetic mapping studies. Here, we examine the impact of regional variation in recombination rate across the human genome. Through computer simulations and literature surveys, we conclude that genetic disorders have been mapped to regions of low recombination more often than expected if such diseases were randomly distributed across the genome. This concentration in low recombination regions may be an artifact, and disorders appearing to be caused by a few genes of large effect may be polygenic. Future genetic mapping studies should be conscious of this potential complication by noting the regional recombination rate of regions implicated in diseases.
Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity

PubMed Central

Andersen, Erik C.; Gerke, Justin P.; Shapiro, Joshua A.; Crissman, Jonathan R.; Ghosh, Rajarshi; Bloom, Joshua S.; Félix, Marie-Anne; Kruglyak, Leonid

2011-01-01

The nematode Caenorhabditis elegans is central to research in molecular, cell, and developmental biology, but nearly all of this research has been conducted on a single strain. Comparatively little is known about the population genomic and evolutionary history of this species. We characterized C. elegans genetic variation by high-throughput selective sequencing of a worldwide collection of 200 wild strains, identifying 41,188 single nucleotide polymorphisms. Unexpectedly, C. elegans genome variation is dominated by a set of commonly shared haplotypes on four of the six chromosomes, each spanning many megabases. Population-genetic modeling shows that this pattern was generated by chromosome-scale selective sweeps that have reduced variation worldwide; at least one of these sweeps likely occurred in the past few hundred years. These sweeps, which we hypothesize to be a result of human activity, have dramatically reshaped the global C. elegans population in the recent past. PMID:22286215
Design and Fabrication of a 5-kWe Free-Piston Stirling Power Conversion System

NASA Technical Reports Server (NTRS)

Chapman, Peter A.; Walter, Thomas J.; Brandhorst, Henry W., Jr.

2008-01-01

Progress in the design and fabrication of a 5-kWe free-piston Stirling power conversion system is described. A scaled-down version of the successful 12.5-kWe Component Test Power Converter (CTPC) developed under NAS3-25463, this single cylinder prototype incorporates cost effective and readily available materials (steel versus beryllium) and components (a commercial linear alternator). The design consists of a displacer suspended on internally pumped gas bearings and a power piston/alternator supported on flexures. Non-contacting clearance seals are used between internal volumes. Heat to and from the prototype is supplied via pumped liquid loops passing through shell and tube heat exchangers. The control system incorporates several novel ideas such as a pulse start capability and a piston stroke set point control strategy that provides the ability to throttle the engine to match the required output power. It also ensures stable response to various disturbances such as electrical load variations while providing useful data regarding the position of both power piston and displacer. All design and analysis activities are complete and fabrication is underway. Prototype test is planned for summer 2008 at Foster-Miller to characterize the dynamics and steady-state operation of the prototype and determine maximum power output and system efficiency. Further tests will then be performed at Auburn University to determine start-up and shutdown characteristics and assess transient response to temperature and load variations.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size

PubMed Central

Organ, Chris L.; Brusatte, Stephen L.; Stein, Koen

2009-01-01

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77–2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97–2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05–5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group. PMID:19793755
Impact of retrotransposons in pluripotent stem cells.

PubMed

Tanaka, Yoshiaki; Chung, Leeyup; Park, In-Hyun

2012-12-01

Retrotransposons, which constitute approximately 40% of the human genome, have the capacity to 'jump' across the genome. Their mobility contributes to oncogenesis, evolution, and genomic plasticity of the host genome. Induced pluripotent stem cells as well as embryonic stem cells are more susceptible than differentiated cells to genomic aberrations including insertion, deletion and duplication. Recent studies have revealed specific behaviors of retrotransposons in pluripotent cells. Here, we review recent progress in understanding retrotransposons and provide a perspective on the relationship between retrotransposons and genomic variation in pluripotent stem cells.

Analysis of Copy Number Variation in the Abp Gene Regions of Two House Mouse Subspecies Suggests Divergence during the Gene Family Expansions.

PubMed

Pezer, Željka; Chung, Amanda G; Karn, Robert C; Laukaitis, Christina M

2017-06-01

The Androgen-binding protein ( Abp ) gene region of the mouse genome contains 64 genes, some encoding pheromones that influence assortative mating between mice from different subspecies. Using CNVnator and quantitative PCR, we explored copy number variation in this gene family in natural populations of Mus musculus domesticus ( Mmd ) and Mus musculus musculus ( Mmm ), two subspecies of house mice that form a narrow hybrid zone in Central Europe. We found that copy number variation in the center of the Abp gene region is very common in wild Mmd , primarily representing the presence/absence of the final duplications described for the mouse genome. Clustering of Mmd individuals based on this variation did not reflect their geographical origin, suggesting no population divergence in the Abp gene cluster. However, copy number variation patterns differ substantially between Mmd and other mouse taxa. Large blocks of Abp genes are absent in Mmm , Mus musculus castaneus and an outgroup, Mus spretus , although with differences in variation and breakpoint locations. Our analysis calls into question the reliance on a reference genome for interpreting the detailed organization of genes in taxa more distant from the Mmd reference genome. The polymorphic nature of the gene family expansion in all four taxa suggests that the number of Abp genes, especially in the central gene region, is not critical to the survival and reproduction of the mouse. However, Abp haplotypes of variable length may serve as a source of raw genetic material for new signals influencing reproductive communication and thus speciation of mice. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

PubMed

Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

PubMed Central

Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

PubMed

Raveendar, Sebastin; Na, Young-Wang; Lee, Jung-Ro; Shim, Donghwan; Ma, Kyung-Ho; Lee, Sok-Young; Chung, Jong-Wook

2015-07-20

Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.
The BIG Data Center: from deposition to integration to translation

PubMed Central

2017-01-01

Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn. PMID:27899658
Genome size variation among sex types in dioecious and triecious Caricaceae species

USDA-ARS?s Scientific Manuscript database

Caricaceae is a small family consisting of 35 species of varying sexual systems and includes economically important fruit crop, Carica papaya, and other species of “highland papayas”. Flow cytometry was used to obtain genome sizes for 11 species in three genera of Caricaceae to determine if genome s...
Dissecting the human microbiome with single-cell genomics.

PubMed

Tolonen, Andrew C; Xavier, Ramnik J

2017-06-14

Recent advances in genome sequencing of single microbial cells enable the assignment of functional roles to members of the human microbiome that cannot currently be cultured. This approach can reveal the genomic basis of phenotypic variation between closely related strains and can be applied to the targeted study of immunogenic bacteria in disease.
Feast and famine in plant genomes.

Treesearch

Jonathan F. Wendel; Richard C. Cronn; J. Spencer Jonhston; H. James Price

2002-01-01

Plant genomes vary over several orders of magnitude in size, even among closely related species, yet the origin, genesis and significance of this variation are not clear. Because DNA content varies over a sevenfold range among diploid species in the cotton genus (Gossypium) and its allies, this group offers opportunities for exploring patterns and mechanisms of genome...
Comprehensive Genome-wide Screen for Genes with Cis-acting Regulatory Elements That Respond to Marek's Disease Virus Infection

USDA-ARS?s Scientific Manuscript database

The comprehensive identification of genes underlying phenotypic variation of complex traits such as disease resistance remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically...
Africa: continent of genome contrasts with implications for biomedical research and health.

PubMed

Ramsay, Michèle

2012-08-31

The genomic architecture of African populations is poorly understood and there is considerable variation between ethno-linguistic groups. Genome-wide approaches have been extensively applied to search for genetic associations to complex traits in Europeans, but rarely in Africans. This is largely attributed to lower levels of funding, poor infrastructure and public health systems, and to the small pool of trained scientists. High levels of genetic variation and underlying population structure in Africans present significant challenges, but lower levels of linkage disequilibrium provide an opportunity for more effective localisation of causal variants. High throughput technologies, including dense genotyping arrays, genome sequencing and epigenome studies, together with plummeting costs, are making research more affordable, even for African scientists. Understanding the interactions between genome structure and environmental influences is essential to interpreting their contributions to the increase in infectious diseases and non-communicable diseases, exacerbated by adverse environments and lifestyle choices. The unique genome dynamics in African populations have an important role to play in understanding human health and susceptibility to disease. Copyright © 2012. Published by Elsevier B.V.
Contrasting support for alternative models of genomic variation based on microhabitat preference: species-specific effects of climate change in alpine sedges.

PubMed

Massatti, Rob; Knowles, L Lacey

2016-08-01

Deterministic processes may uniquely affect codistributed species' phylogeographic patterns such that discordant genetic variation among taxa is predicted. Yet, explicitly testing expectations of genomic discordance in a statistical framework remains challenging. Here, we construct spatially and temporally dynamic models to investigate the hypothesized effect of microhabitat preferences on the permeability of glaciated regions to gene flow in two closely related montane species. Utilizing environmental niche models from the Last Glacial Maximum and the present to inform demographic models of changes in habitat suitability over time, we evaluate the relative probabilities of two alternative models using approximate Bayesian computation (ABC) in which glaciated regions are either (i) permeable or (ii) a barrier to gene flow. Results based on the fit of the empirical data to data sets simulated using a spatially explicit coalescent under alternative models indicate that genomic data are consistent with predictions about the hypothesized role of microhabitat in generating discordant patterns of genetic variation among the taxa. Specifically, a model in which glaciated areas acted as a barrier was much more probable based on patterns of genomic variation in Carex nova, a wet-adapted species. However, in the dry-adapted Carex chalciolepis, the permeable model was more probable, although the difference in the support of the models was small. This work highlights how statistical inferences can be used to distinguish deterministic processes that are expected to result in discordant genomic patterns among species, including species-specific responses to climate change. © 2016 John Wiley & Sons Ltd.
A map of human genome variation from population-scale sequencing.

PubMed

Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

2010-10-28

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.
Sequence Polymorphisms and Structural Variations among Four Grapevine (Vitis vinifera L.) Cultivars Representing Sardinian Agriculture

PubMed Central

Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore

2017-01-01

The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732
Phenotype-loci associations in networks of patients with rare disorders: application to assist in the diagnosis of novel clinical cases.

PubMed

Bueno, Anibal; Rodríguez-López, Rocío; Reyes-Palomares, Armando; Rojano, Elena; Corpas, Manuel; Nevado, Julián; Lapunzina, Pablo; Sánchez-Jiménez, Francisca; Ranea, Juan A G

2018-06-26

Copy number variations (CNVs) are genomic structural variations (deletions, duplications, or translocations) that represent the 4.8-9.5% of human genome variation in healthy individuals. In some cases, CNVs can also lead to disease, being the etiology of many known rare genetic/genomic disorders. Despite the last advances in genomic sequencing and diagnosis, the pathological effects of many rare genetic variations remain unresolved, largely due to the low number of patients available for these cases, making it difficult to identify consistent patterns of genotype-phenotype relationships. We aimed to improve the identification of statistically consistent genotype-phenotype relationships by integrating all the genetic and clinical data of thousands of patients with rare genomic disorders (obtained from the DECIPHER database) into a phenotype-patient-genotype tripartite network. Then we assessed how our network approach could help in the characterization and diagnosis of novel cases in clinical genetics. The systematic approach implemented in this work is able to better define the relationships between phenotypes and specific loci, by exploiting large-scale association networks of phenotypes and genotypes in thousands of rare disease patients. The application of the described methodology facilitated the diagnosis of novel clinical cases, ranking phenotypes by locus specificity and reporting putative new clinical features that may suggest additional clinical follow-ups. In this work, the proof of concept developed over a set of novel clinical cases demonstrates that this network-based methodology might help improve the precision of patient clinical records and the characterization of rare syndromes.
Variation in Genomic Methylation in Natural Populations of Chinese White Poplar

PubMed Central

Ma, Kaifeng; Song, Yuepeng; Yang, Xiaohui; Zhang, Zhiyi; Zhang, Deqiang

2013-01-01

Background It is thought that methylcytosine can be inherited through meiosis and mitosis, and that epigenetic variation may be under genetic control or correlation may be caused by neutral drift. However, DNA methylation also varies with tissue, developmental stage, and environmental factors. Eliminating these factors, we analyzed the levels and patterns, diversity and structure of genomic methylcytosine in the xylem of nine natural populations of Chinese white poplar. Principal Findings On average, the relative total methylation and non-methylation levels were approximately 26.567% and 42.708% (P<0.001), respectively. Also, the relative CNG methylation level was higher than the relative CG methylation level. The relative methylation/non-methylation levels were significantly different among the nine natural populations. Epigenetic diversity ranged from 0.811 (Gansu) to 1.211 (Shaanxi), and the coefficients of epigenetic differentiation (GST = 0.159) were assessed by Shannon’s diversity index. Co-inertia analysis indicated that methylation-sensitive polymorphism (MSP) and genomic methylation pattern (CG-CNG) profiles gave similar distributions. Using a between-group eigen analysis, we found that the Hebei and Shanxi populations were independent of each other, but the Henan population intersected with the other populations, to some degree. Conclusions Genome methylation in Populus tomentosa presented tissue-specific characteristics and the relative 5′-CCGG methylation level was higher in xylem than in leaves. Meanwhile, the genome methylation in the xylem shows great epigenetic variation and could be fixed and inherited though mitosis. Compared to genetic structure, data suggest that epigenetic and genetic variation do not completely match. PMID:23704963
Genomic Selection in Multi-environment Crop Trials.

PubMed

Oakey, Helena; Cullis, Brian; Thompson, Robin; Comadran, Jordi; Halpin, Claire; Waugh, Robbie

2016-05-03

Genomic selection in crop breeding introduces modeling challenges not found in animal studies. These include the need to accommodate replicate plants for each line, consider spatial variation in field trials, address line by environment interactions, and capture nonadditive effects. Here, we propose a flexible single-stage genomic selection approach that resolves these issues. Our linear mixed model incorporates spatial variation through environment-specific terms, and also randomization-based design terms. It considers marker, and marker by environment interactions using ridge regression best linear unbiased prediction to extend genomic selection to multiple environments. Since the approach uses the raw data from line replicates, the line genetic variation is partitioned into marker and nonmarker residual genetic variation (i.e., additive and nonadditive effects). This results in a more precise estimate of marker genetic effects. Using barley height data from trials, in 2 different years, of up to 477 cultivars, we demonstrate that our new genomic selection model improves predictions compared to current models. Analyzing single trials revealed improvements in predictive ability of up to 5.7%. For the multiple environment trial (MET) model, combining both year trials improved predictive ability up to 11.4% compared to a single environment analysis. Benefits were significant even when fewer markers were used. Compared to a single-year standard model run with 3490 markers, our partitioned MET model achieved the same predictive ability using between 500 and 1000 markers depending on the trial. Our approach can be used to increase accuracy and confidence in the selection of the best lines for breeding and/or, to reduce costs by using fewer markers. Copyright © 2016 Oakey et al.
Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

PubMed

Zapata, Luis; Ding, Jia; Willing, Eva-Maria; Hartwig, Benjamin; Bezdan, Daniela; Jiao, Wen-Biao; Patel, Vipul; Velikkakam James, Geo; Koornneef, Maarten; Ossowski, Stephan; Schneeberger, Korbinian

2016-07-12

Resequencing or reference-based assemblies reveal large parts of the small-scale sequence variation. However, they typically fail to separate such local variation into colinear and rearranged variation, because they usually do not recover the complement of large-scale rearrangements, including transpositions and inversions. Besides the availability of hundreds of genomes of diverse Arabidopsis thaliana accessions, there is so far only one full-length assembled genome: the reference sequence. We have assembled 117 Mb of the A. thaliana Landsberg erecta (Ler) genome into five chromosome-equivalent sequences using a combination of short Illumina reads, long PacBio reads, and linkage information. Whole-genome comparison against the reference sequence revealed 564 transpositions and 47 inversions comprising ∼3.6 Mb, in addition to 4.1 Mb of nonreference sequence, mostly originating from duplications. Although rearranged regions are not different in local divergence from colinear regions, they are drastically depleted for meiotic recombination in heterozygotes. Using a 1.2-Mb inversion as an example, we show that such rearrangement-mediated reduction of meiotic recombination can lead to genetically isolated haplotypes in the worldwide population of A. thaliana Moreover, we found 105 single-copy genes, which were only present in the reference sequence or the Ler assembly, and 334 single-copy orthologs, which showed an additional copy in only one of the genomes. To our knowledge, this work gives first insights into the degree and type of variation, which will be revealed once complete assemblies will replace resequencing or other reference-dependent methods.
Sex reduces genetic variation: a multidisciplinary review.

PubMed

Gorelick, Root; Heng, Henry H Q

2011-04-01

For over a century, the paradigm has been that sex invariably increases genetic variation, despite many renowned biologists asserting that sex decreases most genetic variation. Sex is usually perceived as the source of additive genetic variance that drives eukaryotic evolution vis-à-vis adaptation and Fisher's fundamental theorem. However, evidence for sex decreasing genetic variation appears in ecology, paleontology, population genetics, and cancer biology. The common thread among many of these disciplines is that sex acts like a coarse filter, weeding out major changes, such as chromosomal rearrangements (that are almost always deleterious), but letting minor variation, such as changes at the nucleotide or gene level (that are often neutral), flow through the sexual sieve. Sex acts as a constraint on genomic and epigenetic variation, thereby limiting adaptive evolution. The diverse reasons for sex reducing genetic variation (especially at the genome level) and slowing down evolution may provide a sufficient benefit to offset the famed costs of sex. © 2010 The Author(s). Evolution© 2010 The Society for the Study of Evolution.
Genomics-Based Security Protocols: From Plaintext to Cipherprotein

NASA Technical Reports Server (NTRS)

Shaw, Harry; Hussein, Sayed; Helgert, Hermann

2011-01-01

The evolving nature of the internet will require continual advances in authentication and confidentiality protocols. Nature provides some clues as to how this can be accomplished in a distributed manner through molecular biology. Cryptography and molecular biology share certain aspects and operations that allow for a set of unified principles to be applied to problems in either venue. A concept for developing security protocols that can be instantiated at the genomics level is presented. A DNA (Deoxyribonucleic acid) inspired hash code system is presented that utilizes concepts from molecular biology. It is a keyed-Hash Message Authentication Code (HMAC) capable of being used in secure mobile Ad hoc networks. It is targeted for applications without an available public key infrastructure. Mechanics of creating the HMAC are presented as well as a prototype HMAC protocol architecture. Security concepts related to the implementation differences between electronic domain security and genomics domain security are discussed.
Nuclear fusion and genome encounter during yeast zygote formation.

PubMed

Tartakoff, Alan Michael; Jaiswal, Purnima

2009-06-01

When haploid cells of Saccharomyces cerevisiae are crossed, parental nuclei congress and fuse with each other. To investigate underlying mechanisms, we have developed assays that evaluate the impact of drugs and mutations. Nuclear congression is inhibited by drugs that perturb the actin and tubulin cytoskeletons. Nuclear envelope (NE) fusion consists of at least five steps in which preliminary modifications are followed by controlled flux of first outer and then inner membrane proteins, all before visible dilation of the waist of the nucleus or coalescence of the parental spindle pole bodies. Flux of nuclear pore complexes occurs after dilation. Karyogamy requires both the Sec18p/NSF ATPase and ER/NE luminal homeostasis. After fusion, chromosome tethering keeps tagged parental genomes separate from each other. The process of NE fusion and evidence of genome independence in yeast provide a prototype for understanding related events in higher eukaryotes.

Antigenic and genetic analyses of isolate APMV/wigeon/Italy/3920-1/2005 indicate that it represents a new avian paramyxovirus (APMV-12).

PubMed

Terregino, C; Aldous, E W; Heidari, A; Fuller, C M; De Nardi, R; Manvell, R J; Beato, M S; Shell, W M; Monne, I; Brown, I H; Alexander, D J; Capua, I

2013-11-01

Isolate wigeon/Italy/3920-1/2005 (3920-1) was obtained during surveillance of wild birds in November 2005 in the Rovigo province of Northern Italy and shown to be a paramyxovirus. Analysis of cross-haemagglutination-inhibition tests between 3920-1 and representative avian paramyxoviruses showed only a low-level relationship to APMV-1. Phylogenetic analysis of the whole genome and each of the six genes indicated that while 3920-1 grouped with APMV-1 and APMV-9 viruses, it was quite distinct from these two. In the whole-genome analysis, 3920-1 had 52.1 % nucleotide sequence identity to the closest APMV-1 virus, 50.1 % identity to the APMV-9 genome, and less than 42 % identity to representatives of the other avian paramyxovirus groups. We propose isolate wigeon/Italy/3920-1/2005 as the prototype strain of a further APMV group, APMV-12.
Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.

PubMed

Stickney, A L; Dunowska, M; Cave, N J

2013-06-08

The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.
Mitochondrial DNA: impacting central and peripheral nervous systems

PubMed Central

Carelli, Valerio

2014-01-01

Because of their high-energy metabolism, neurons are highly dependent on mitochondria, which generate cellular ATP through oxidative phosphorylation. The mitochondrial genome encodes for critical components of the oxidative phosphorylation pathway machinery, and therefore mutations in mitochondrial DNA (mtDNA) cause energy production defects that frequently have severe neurological manifestations. Here, we review the principles of mitochondrial genetics and focus on prototypical mitochondrial diseases to illustrate how primary defects in mtDNA or secondary defects in mtDNA due to nuclear genome mutations can cause prominent neurological and multisystem features. In addition, we discuss the pathophysiological mechanisms underlying mitochondrial diseases, the cellular mechanisms that protect mitochondrial integrity, and the prospects for therapy. PMID:25521375
[Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

PubMed

Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

2010-01-01

Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
The complex hybrid origins of the root knot nematodes revealed through comparative genomics

PubMed Central

Kumar, Sujai; Koutsovoulos, Georgios; Blaxter, Mark L.

2014-01-01

Root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by obligatory mitotic parthenogenesis and it has been suggested that these species originated from interspecific hybridizations between unknown parental taxa. We have sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that this species was involved in the hybrid origin of the tropical mitotic parthenogen Meloidogyne incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species Meloidogyne hapla was carried out to trace the evolutionary history of these species’ genomes, and we demonstrate that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome itself revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified, parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and success in agricultural environments may be related to this hybridization, producing transgressive variation on which natural selection can act. It is now clear that studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally. PMID:24860695
Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma

PubMed Central

Yan, Deng; Yi, Song; Chiu, Wang Chi; Qin, Liu Gui; Kin, Wong Hoi; Kwok Hung, Chung Tony; Linxiao, Han; Wai, Choy Kwong; Yi, Sui; Tao, Yang; Tao, Tang

2017-01-01

Objective This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Methods Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Results Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification (P=0.009) or deletion (P=0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly (P=1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Conclusion Chromosomal CNVs may contribute to their transcript expression in cervical cancer. PMID:29312578
Museum genomics: low-cost and high-accuracy genetic data from historical specimens.

PubMed

Rowe, Kevin C; Singhal, Sonal; Macmanes, Matthew D; Ayroles, Julien F; Morelli, Toni Lyn; Rubidge, Emily M; Bi, Ke; Moritz, Craig C

2011-11-01

Natural history collections are unparalleled repositories of geographical and temporal variation in faunal conditions. Molecular studies offer an opportunity to uncover much of this variation; however, genetic studies of historical museum specimens typically rely on extracting highly degraded and chemically modified DNA samples from skins, skulls or other dried samples. Despite this limitation, obtaining short fragments of DNA sequences using traditional PCR amplification of DNA has been the primary method for genetic study of historical specimens. Few laboratories have succeeded in obtaining genome-scale sequences from historical specimens and then only with considerable effort and cost. Here, we describe a low-cost approach using high-throughput next-generation sequencing to obtain reliable genome-scale sequence data from a traditionally preserved mammal skin and skull using a simple extraction protocol. We show that single-nucleotide polymorphisms (SNPs) from the genome sequences obtained independently from the skin and from the skull are highly repeatable compared to a reference genome. © 2011 Blackwell Publishing Ltd.
Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution.

PubMed

Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D; García-Rodríguez, Fernando M; Nisa-Martínez, Rafael

2018-01-01

Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti , the nitrogen-fixing endosymbiont of legumes of genus Medicago , harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation.
Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution

PubMed Central

Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D.; García-Rodríguez, Fernando M.; Nisa-Martínez, Rafael

2018-01-01

Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti, the nitrogen-fixing endosymbiont of legumes of genus Medicago, harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation. PMID:29670598
Substantial variation in the extent of mitochondrial genome fragmentation among blood-sucking lice of mammals.

PubMed

Jiang, Haowei; Barker, Stephen C; Shao, Renfu

2013-01-01

Blood-sucking lice of humans have extensively fragmented mitochondrial (mt) genomes. Human head louse and body louse have their 37 mt genes on 20 minichromosomes. In human pubic louse, the 34 mt genes known are on 14 minichromosomes. To understand the process of mt genome fragmentation in the blood-sucking lice of mammals, we sequenced the mt genomes of the domestic pig louse, Haematopinus suis, and the wild pig louse, H. apri, which diverged from human lice approximately 65 Ma. The 37 mt genes of the pig lice are on nine circular minichromosomes; each minichromosome is 3-4 kb in size. The pig lice have four genes per minichromosome on average, in contrast to two genes per minichromosome in the human lice. One minichromosome of the pig lice has eight genes and is the most gene-rich minichromosome found in the sucking lice. Our results indicate substantial variation in the rate and extent of mt genome fragmentation among different lineages of the sucking lice.
Integrated analysis of chromosome copy number variation and gene expression in cervical carcinoma.

PubMed

Yan, Deng; Yi, Song; Chiu, Wang Chi; Qin, Liu Gui; Kin, Wong Hoi; Kwok Hung, Chung Tony; Linxiao, Han; Wai, Choy Kwong; Yi, Sui; Tao, Yang; Tao, Tang

2017-12-12

This study was conducted to explore chromosomal copy number variations (CNV) and transcript expression and to examine pathways in cervical pathogenesis using genome-wide high resolution microarrays. Genome-wide chromosomal CNVs were investigated in 6 cervical cancer cell lines by Human Genome CGH Microarray Kit (4x44K). Gene expression profiles in cervical cancer cell lines, primary cervical carcinoma and normal cervical epithelium tissues were also studied using the Whole Human Genome Microarray Kit (4x44K). Fifty common chromosomal CNVs were identified in the cervical cancer cell lines. Correlation analysis revealed that gene up-regulation or down-regulation is significantly correlated with genomic amplification ( P =0.009) or deletion ( P =0.006) events. Expression profiles were identified through cluster analysis. Gene annotation analysis pinpointed cell cycle pathways was significantly ( P =1.15E-08) affected in cervical cancer. Common CNVs were associated with cervical cancer. Chromosomal CNVs may contribute to their transcript expression in cervical cancer.
Substantial Variation in the Extent of Mitochondrial Genome Fragmentation among Blood-Sucking Lice of Mammals

PubMed Central

Jiang, Haowei; Barker, Stephen C.; Shao, Renfu

2013-01-01

Blood-sucking lice of humans have extensively fragmented mitochondrial (mt) genomes. Human head louse and body louse have their 37 mt genes on 20 minichromosomes. In human pubic louse, the 34 mt genes known are on 14 minichromosomes. To understand the process of mt genome fragmentation in the blood-sucking lice of mammals, we sequenced the mt genomes of the domestic pig louse, Haematopinus suis, and the wild pig louse, H. apri, which diverged from human lice approximately 65 Ma. The 37 mt genes of the pig lice are on nine circular minichromosomes; each minichromosome is 3–4 kb in size. The pig lice have four genes per minichromosome on average, in contrast to two genes per minichromosome in the human lice. One minichromosome of the pig lice has eight genes and is the most gene-rich minichromosome found in the sucking lice. Our results indicate substantial variation in the rate and extent of mt genome fragmentation among different lineages of the sucking lice. PMID:23781098
Effective de novo assembly of fish genome using haploid larvae.

PubMed

Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru

2016-02-01

Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Development of genome- and transcriptome-derived microsatellites in related species of snapping shrimps with highly duplicated genomes.

PubMed

Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R

2017-11-01

Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.
Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms

PubMed Central

Haraksingh, Rajini R.; Abyzov, Alexej; Gerstein, Mark; Urban, Alexander E.; Snyder, Michael

2011-01-01

Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications. PMID:22140474
Evolutionary growth process of highly conserved sequences in vertebrate genomes.

PubMed

Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

2012-08-01

Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.
The 1000 Genomes Project: data management and community access.

PubMed

Clarke, Laura; Zheng-Bradley, Xiangqun; Smith, Richard; Kulesha, Eugene; Xiao, Chunlin; Toneva, Iliana; Vaughan, Brendan; Preuss, Don; Leinonen, Rasko; Shumway, Martin; Sherry, Stephen; Flicek, Paul

2012-04-27

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.
Genome-wide association implicates numerous genes and pleiotropy underlying ecological trait variation in natural populations of Populus trichocarpa

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKown, Athena; Klapste, Jaroslav; Guy, Robert

2014-01-01

To uncover the genetic basis of phenotypic trait variation, we used 448 unrelated wild accessions of black cottonwood (Populus trichocarpa Torr. & Gray) from natural populations throughout western North America. Extensive information from large-scale trait phenotyping (with spatial and temporal replications within a common garden) and genotyping (with a 34K Populus SNP array) of all accessions were used for gene discovery in a genome-wide association study (GWAS).
Efficient genotype compression and analysis of large genetic variation datasets

PubMed Central

Layer, Ryan M.; Kindlon, Neil; Karczewski, Konrad J.; Quinlan, Aaron R.

2015-01-01

Genotype Query Tools (GQT) is a new indexing strategy that expedites analyses of genome variation datasets in VCF format based on sample genotypes, phenotypes and relationships. GQT’s compressed genotype index minimizes decompression for analysis, and performance relative to existing methods improves with cohort size. We show substantial (up to 443 fold) performance gains over existing methods and demonstrate GQT’s utility for exploring massive datasets involving thousands to millions of genomes. PMID:26550772
An Overview of Genomic Sequence Variation Markup Language (GSVML)

PubMed Central

Nakaya, Jun; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Kimura, Michio

2006-01-01

Internationally accumulated genomic sequence variation data on human requires the interoperable data exchanging format. We developed the GSVML as the data exchanging format. The GSVML is human health oriented and has three categories. Analyses on the use case in human health domain and the investigation on the databases and markup languages were conducted. An interface ability to Health Level Seven Genotype Model was examined. GSVML provides a sharable platform for both clinical and research applications.

Continuous Morphological Variation Correlated with Genome Size Indicates Frequent Introgressive Hybridization among Diphasiastrum Species (Lycopodiaceae) in Central Europe

PubMed Central

Hanušová, Kristýna; Ekrt, Libor; Vít, Petr; Kolář, Filip; Urfus, Tomáš

2014-01-01

Introgressive hybridization is an important evolutionary process frequently contributing to diversification and speciation of angiosperms. Its extent in other groups of land plants has only rarely been studied, however. We therefore examined the levels of introgression in the genus Diphasiastrum, a taxonomically challenging group of Lycopodiophytes, using flow cytometry and numerical and geometric morphometric analyses. Patterns of morphological and cytological variation were evaluated in an extensive dataset of 561 individuals from 57 populations of six taxa from Central Europe, the region with the largest known taxonomic complexity. In addition, genome size values of 63 individuals from Northern Europe were acquired for comparative purposes. Within Central European populations, we detected a continuous pattern in both morphological variation and genome size (strongly correlated together) suggesting extensive levels of interspecific gene flow within this region, including several large hybrid swarm populations. The secondary character of habitats of Central European hybrid swarm populations suggests that man-made landscape changes might have enhanced unnatural contact of species, resulting in extensive hybridization within this area. On the contrary, a distinct pattern of genome size variation among individuals from other parts of Europe indicates that pure populations prevail outside Central Europe. All in all, introgressive hybridization among Diphasiastrum species in Central Europe represents a unique case of extensive interspecific gene flow among spore producing vascular plants that cause serious complications of taxa delimitation. PMID:24932509
Assessing signatures of selection through variation in linkage disequilibrium between taurine and indicine cattle

PubMed Central

2014-01-01

Background Signatures of selection are regions in the genome that have been preferentially increased in frequency and fixed in a population because of their functional importance in specific processes. These regions can be detected because of their lower genetic variability and specific regional linkage disequilibrium (LD) patterns. Methods By comparing the differences in regional LD variation between dairy and beef cattle types, and between indicine and taurine subspecies, we aim at finding signatures of selection for production and adaptation in cattle breeds. The VarLD method was applied to compare the LD variation in the autosomal genome between breeds, including Angus and Brown Swiss, representing taurine breeds, and Nelore and Gir, representing indicine breeds. Genomic regions containing the top 0.01 and 0.1 percentile of signals were characterized using the UMD3.1 Bos taurus genome assembly to identify genes in those regions and compared with previously reported selection signatures and regions with copy number variation. Results For all comparisons, the top 0.01 and 0.1 percentile included 26 and 165 signals and 17 and 125 genes, respectively, including TECRL, BT.23182 or FPPS, CAST, MYOM1, UVRAG and DNAJA1. Conclusions The VarLD method is a powerful tool to identify differences in linkage disequilibrium between cattle populations and putative signatures of selection with potential adaptive and productive importance. PMID:24592996
Development of a new medium frequency EM device: Mapping soil water content variations using electrical conductivity and dielectric permittivity

NASA Astrophysics Data System (ADS)

Kessouri, P.; Buvat, S.; Tabbagh, A.

2012-12-01

Both electrical conductivity and dielectric permittivity of soil are influenced by its water content. Dielectric permittivity is usually measured in the high frequency range, using GPR or TDR, where the sensitivity to water content is high. However, its evaluation is limited by a low investigation depth, especially for clay rich soils. Electrical conductivity is closely related not only to soil water content, but also to clay content and soil structure. A simultaneous estimation of these electrical parameters can allow the mapping of soil water content variations for an investigation depth close to 1m. In order to estimate simultaneously both soil electrical conductivity and dielectric permittivity, an electromagnetic device working in the medium frequency range (between 100 kHz and 10 MHz) has been designed. We adopted Slingram geometry for the EM prototype: its PERP configuration (vertical transmission loop Tx and horizontal measuring loop Rx) was defined using 1D ground models. As the required investigation depth is around 1m, the coil spacing was fixed to 1.2m. This prototype works in a frequency range between 1 and 5 MHz. After calibration, we tested the response of prototype to objects with known properties. The first in situ measurements were led on experimental sites with different types of soils and different water content variations (artificially created or natural): sandy alluvium on a plot of INRA (French National Institute for Agricultural Research) in Orléans (Centre, France), a clay-loam soil on an experimental site in Estrée-Mons (Picardie, France) and fractured limestone at the vicinity of Grand (Vosges, France). In the case of the sandy alluvium, the values of dielectric permittivity measured are close to those of HF permittivity and allow the use of existing theoretical models to determine the soil water content. For soils containing higher amount of clay, the coupled information brought by the electrical conductivity and the dielectric permittivity is used. Variations of water content detected by the EM prototype are confirmed by additional DC electrical profiling and direct mass water content measurements along depth. For the clay-loam soil, containing more than 20% of clay, the relative dielectric permittivity values, ranging from 63 to 138, are much higher than those expected in the high frequency range (above 20 MHz, the highest measured permittivity is equal to 81 for water). In the medium frequency range, those values are very likely due to interfacial polarization. This effect, also known as Maxwell-Wagner polarization, should increase with the soil clay content. The first measuring trial is coherent with the gravimetric water content as well as DC electrical profiling measurements. For a clay rich soil, the EM prototype is able to detect water content variations for an investigation depth close to 1m with both electrical conductivity and dielectric permittivity in the medium frequency range. Other field experiments are scheduled to confirm these results on other types of soils.
Copy Number Variation across European Populations

PubMed Central

Chen, Wanting; Hayward, Caroline; Wright, Alan F.; Hicks, Andrew A.; Vitart, Veronique; Knott, Sara; Wild, Sarah H.; Pramstaller, Peter P.; Wilson, James F.; Rudan, Igor; Porteous, David J.

2011-01-01

Genome analysis provides a powerful approach to test for evidence of genetic variation within and between geographical regions and local populations. Copy number variants which comprise insertions, deletions and duplications of genomic sequence provide one such convenient and informative source. Here, we investigate copy number variants from genome wide scans of single nucleotide polymorphisms in three European population isolates, the island of Vis in Croatia, the islands of Orkney in Scotland and the South Tyrol in Italy. We show that whereas the overall copy number variant frequencies are similar between populations, their distribution is highly specific to the population of origin, a finding which is supported by evidence for increased kinship correlation for specific copy number variants within populations. PMID:21829696
Evolutionary constraints or opportunities?

PubMed

Sharov, Alexei A

2014-09-01

Natural selection is traditionally viewed as a leading factor of evolution, whereas variation is assumed to be random and non-directional. Any order in variation is attributed to epigenetic or developmental constraints that can hinder the action of natural selection. In contrast I consider the positive role of epigenetic mechanisms in evolution because they provide organisms with opportunities for rapid adaptive change. Because the term "constraint" has negative connotations, I use the term "regulated variation" to emphasize the adaptive nature of phenotypic variation, which helps populations and species to survive and evolve in changing environments. The capacity to produce regulated variation is a phenotypic property, which is not described in the genome. Instead, the genome acts as a switchboard, where mostly random mutations switch "on" or "off" preexisting functional capacities of organism components. Thus, there are two channels of heredity: informational (genomic) and structure-functional (phenotypic). Functional capacities of organisms most likely emerged in a chain of modifications and combinations of more simple ancestral functions. The role of DNA has been to keep records of these changes (without describing the result) so that they can be reproduced in the following generations. Evolutionary opportunities include adjustments of individual functions, multitasking, connection between various components of an organism, and interaction between organisms. The adaptive nature of regulated variation can be explained by the differential success of lineages in macro-evolution. Lineages with more advantageous patterns of regulated variation are likely to produce more species and secure more resources (i.e., long-term lineage selection). Published by Elsevier Ireland Ltd.
Host genetic variation impacts microbiome composition across human body sites.

PubMed

Blekhman, Ran; Goodrich, Julia K; Huang, Katherine; Sun, Qi; Bukowski, Robert; Bell, Jordana T; Spector, Timothy D; Keinan, Alon; Ley, Ruth E; Gevers, Dirk; Clark, Andrew G

2015-09-15

The composition of bacteria in and on the human body varies widely across human individuals, and has been associated with multiple health conditions. While microbial communities are influenced by environmental factors, some degree of genetic influence of the host on the microbiome is also expected. This study is part of an expanding effort to comprehensively profile the interactions between human genetic variation and the composition of this microbial ecosystem on a genome- and microbiome-wide scale. Here, we jointly analyze the composition of the human microbiome and host genetic variation. By mining the shotgun metagenomic data from the Human Microbiome Project for host DNA reads, we gathered information on host genetic variation for 93 individuals for whom bacterial abundance data are also available. Using this dataset, we identify significant associations between host genetic variation and microbiome composition in 10 of the 15 body sites tested. These associations are driven by host genetic variation in immunity-related pathways, and are especially enriched in host genes that have been previously associated with microbiome-related complex diseases, such as inflammatory bowel disease and obesity-related disorders. Lastly, we show that host genomic regions associated with the microbiome have high levels of genetic differentiation among human populations, possibly indicating host genomic adaptation to environment-specific microbiomes. Our results highlight the role of host genetic variation in shaping the composition of the human microbiome, and provide a starting point toward understanding the complex interaction between human genetics and the microbiome in the context of human evolution and disease.
Privacy-preserving GWAS analysis on federated genomic datasets.

PubMed

Constable, Scott D; Tang, Yuzhe; Wang, Shuang; Jiang, Xiaoqian; Chapin, Steve

2015-01-01

The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use. We present a privacy-preserving GWAS framework on federated genomic datasets. Our method is to layer the GWAS computations on top of secure multi-party computation (MPC) systems. This approach allows two parties in a distributed system to mutually perform secure GWAS computations, but without exposing their private data outside. We demonstrate our technique by implementing a framework for minor allele frequency counting and χ2 statistics calculation, one of typical computations used in GWAS. For efficient prototyping, we use a state-of-the-art MPC framework, i.e., Portable Circuit Format (PCF) 1. Our experimental results show promise in realizing both efficient and secure cross-institution GWAS computations.
Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs.

PubMed

Marsden, Clare D; Ortega-Del Vecchyo, Diego; O'Brien, Dennis P; Taylor, Jeremy F; Ramirez, Oscar; Vilà, Carles; Marques-Bonet, Tomas; Schnabel, Robert D; Wayne, Robert K; Lohmueller, Kirk E

2016-01-05

Population bottlenecks, inbreeding, and artificial selection can all, in principle, influence levels of deleterious genetic variation. However, the relative importance of each of these effects on genome-wide patterns of deleterious variation remains controversial. Domestic and wild canids offer a powerful system to address the role of these factors in influencing deleterious variation because their history is dominated by known bottlenecks and intense artificial selection. Here, we assess genome-wide patterns of deleterious variation in 90 whole-genome sequences from breed dogs, village dogs, and gray wolves. We find that the ratio of amino acid changing heterozygosity to silent heterozygosity is higher in dogs than in wolves and, on average, dogs have 2-3% higher genetic load than gray wolves. Multiple lines of evidence indicate this pattern is driven by less efficient natural selection due to bottlenecks associated with domestication and breed formation, rather than recent inbreeding. Further, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. To our knowledge, these results provide the first quantitative estimates of the increased burden of deleterious variants directly associated with domestication and have important implications for selective breeding programs and the conservation of rare and endangered species. Specifically, they highlight the costs associated with selective breeding and question the practice favoring the breeding of individuals that best fit breed standards. Our results also suggest that maintaining a large population size, rather than just avoiding inbreeding, is a critical factor for preventing the accumulation of deleterious variants.
Bottlenecks and selective sweeps during domestication have increased deleterious genetic variation in dogs

PubMed Central

Marsden, Clare D.; Ortega-Del Vecchyo, Diego; O’Brien, Dennis P.; Taylor, Jeremy F.; Ramirez, Oscar; Vilà, Carles; Marques-Bonet, Tomas; Schnabel, Robert D.; Wayne, Robert K.; Lohmueller, Kirk E.

2016-01-01

Population bottlenecks, inbreeding, and artificial selection can all, in principle, influence levels of deleterious genetic variation. However, the relative importance of each of these effects on genome-wide patterns of deleterious variation remains controversial. Domestic and wild canids offer a powerful system to address the role of these factors in influencing deleterious variation because their history is dominated by known bottlenecks and intense artificial selection. Here, we assess genome-wide patterns of deleterious variation in 90 whole-genome sequences from breed dogs, village dogs, and gray wolves. We find that the ratio of amino acid changing heterozygosity to silent heterozygosity is higher in dogs than in wolves and, on average, dogs have 2–3% higher genetic load than gray wolves. Multiple lines of evidence indicate this pattern is driven by less efficient natural selection due to bottlenecks associated with domestication and breed formation, rather than recent inbreeding. Further, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. To our knowledge, these results provide the first quantitative estimates of the increased burden of deleterious variants directly associated with domestication and have important implications for selective breeding programs and the conservation of rare and endangered species. Specifically, they highlight the costs associated with selective breeding and question the practice favoring the breeding of individuals that best fit breed standards. Our results also suggest that maintaining a large population size, rather than just avoiding inbreeding, is a critical factor for preventing the accumulation of deleterious variants. PMID:26699508
High Variation in Pathogenicity of Genetically Closely Related Strains of Xanthomonas albilineans, the Sugarcane Leaf Scald Pathogen, in Guadeloupe.

PubMed

Champoiseau, P; Daugrois, J-H; Pieretti, I; Cociancich, S; Royer, M; Rott, P

2006-10-01

ABSTRACT Pathogenicity of 75 strains of Xanthomonas albilineans from Guadeloupe was assessed by inoculation of sugarcane cv. B69566, which is susceptible to leaf scald, and 19 of the strains were selected as representative of the variation in pathogenicity observed based on stalk colonization. In vitro production of albicidin varied among these 19 strains, but the restriction fragment length polymorphism pattern of their albicidin biosynthesis genes was identical. Similarly, no genomic variation was found among strains by pulsed-field gel electrophoresis. Some variation among strains was found by amplified fragment length polymorphism, but no relationship between this genetic variation and variation in pathogenicity was found. Only 3 (pilB, rpfA, and xpsE) of 40 genes involved in pathogenicity of bacterial species closely related to X. albilineans could be amplified by polymerase chain reaction from total genomic DNA of all nine strains tested of X. albilineans differing in pathogenicity in Guadeloupe. Nucleotide sequences of these genes were 100% identical among strains, and a phylogenetic study with these genes and housekeeping genes efp and ihfA suggested that X. albilineans is on an evolutionary road between the X. campestris group and Xylella fastidiosa, another vascular plant pathogen. Sequencing of the complete genome of Xanthomonas albilineans could be the next step in deciphering molecular mechanisms involved in pathogenicity of X. albilineans.
Parallel Evolution of Copy-Number Variation across Continents in Drosophila melanogaster

PubMed Central

Schrider, Daniel R.; Hahn, Matthew W.; Begun, David J.

2016-01-01

Genetic differentiation across populations that is maintained in the presence of gene flow is a hallmark of spatially varying selection. In Drosophila melanogaster, the latitudinal clines across the eastern coasts of Australia and North America appear to be examples of this type of selection, with recent studies showing that a substantial portion of the D. melanogaster genome exhibits allele frequency differentiation with respect to latitude on both continents. As of yet there has been no genome-wide examination of differentiated copy-number variants (CNVs) in these geographic regions, despite their potential importance for phenotypic variation in Drosophila and other taxa. Here, we present an analysis of geographic variation in CNVs in D. melanogaster. We also present the first genomic analysis of geographic variation for copy-number variation in the sister species, D. simulans, in order to investigate patterns of parallel evolution in these close relatives. In D. melanogaster we find hundreds of CNVs, many of which show parallel patterns of geographic variation on both continents, lending support to the idea that they are influenced by spatially varying selection. These findings support the idea that polymorphic CNVs contribute to local adaptation in D. melanogaster. In contrast, we find very few CNVs in D. simulans that are geographically differentiated in parallel on both continents, consistent with earlier work suggesting that clinal patterns are weaker in this species. PMID:26809315
Genome-Wide Discovery and Deployment of Insertions and Deletions Markers Provided Greater Insights on Species, Genomes, and Sections Relationships in the Genus Arachis.

PubMed

Vishwakarma, Manish K; Kale, Sandip M; Sriswathi, Manda; Naresh, Talari; Shasidhar, Yaduru; Garg, Vanika; Pandey, Manish K; Varshney, Rajeev K

2017-01-01

Small insertions and deletions (InDels) are the second most prevalent and the most abundant structural variations in plant genomes. In order to deploy these genetic variations for genetic analysis in genus Arachis , we conducted comparative analysis of the draft genome assemblies of both the diploid progenitor species of cultivated tetraploid groundnut ( Arachis hypogaea L.) i.e., Arachis duranensis (A subgenome) and Arachis ipaënsis (B subgenome) and identified 515,223 InDels. These InDels include 269,973 insertions identified in A. ipaënsis against A. duranensis while 245,250 deletions in A. duranensis against A. ipaënsis . The majority of the InDels were of single bp (43.7%) and 2-10 bp (39.9%) while the remaining were >10 bp (16.4%). Phylogenetic analysis using genotyping data for 86 (40.19%) polymorphic markers grouped 96 diverse Arachis accessions into eight clusters mostly by the affinity of their genome. This study also provided evidence for the existence of "K" genome, although distinct from both the "A" and "B" genomes, but more similar to "B" genome. The complete homology between A. monticola and A. hypogaea tetraploid taxa showed a very similar genome composition. The above analysis has provided greater insights into the phylogenetic relationship among accessions, genomes, sub species and sections. These InDel markers are very useful resource for groundnut research community for genetic analysis and breeding applications.
Genome Analysis of the Domestic Dog (Korean Jindo) by Massively Parallel Sequencing

PubMed Central

Kim, Ryong Nam; Kim, Dae-Soo; Choi, Sang-Haeng; Yoon, Byoung-Ha; Kang, Aram; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Jong-Joo; Ha, Ji-Hong; Toyoda, Atsushi; Fujiyama, Asao; Kim, Aeri; Kim, Min-Young; Park, Kun-Hyang; Lee, Kang Seon; Park, Hong-Seog

2012-01-01

Although pioneering sequencing projects have shed light on the boxer and poodle genomes, a number of challenges need to be met before the sequencing and annotation of the dog genome can be considered complete. Here, we present the DNA sequence of the Jindo dog genome, sequenced to 45-fold average coverage using Illumina massively parallel sequencing technology. A comparison of the sequence to the reference boxer genome led to the identification of 4 675 437 single nucleotide polymorphisms (SNPs, including 3 346 058 novel SNPs), 71 642 indels and 8131 structural variations. Of these, 339 non-synonymous SNPs and 3 indels are located within coding sequences (CDS). In particular, 3 non-synonymous SNPs and a 26-bp deletion occur in the TCOF1 locus, implying that the difference observed in cranial facial morphology between Jindo and boxer dogs might be influenced by those variations. Through the annotation of the Jindo olfactory receptor gene family, we found 2 unique olfactory receptor genes and 236 olfactory receptor genes harbouring non-synonymous homozygous SNPs that are likely to affect smelling capability. In addition, we determined the DNA sequence of the Jindo dog mitochondrial genome and identified Jindo dog-specific mtDNA genotypes. This Jindo genome data upgrade our understanding of dog genomic architecture and will be a very valuable resource for investigating not only dog genetics and genomics but also human and dog disease genetics and comparative genomics. PMID:22474061
Genome-Wide Discovery and Deployment of Insertions and Deletions Markers Provided Greater Insights on Species, Genomes, and Sections Relationships in the Genus Arachis

PubMed Central

Vishwakarma, Manish K.; Kale, Sandip M.; Sriswathi, Manda; Naresh, Talari; Shasidhar, Yaduru; Garg, Vanika; Pandey, Manish K.; Varshney, Rajeev K.

2017-01-01

Small insertions and deletions (InDels) are the second most prevalent and the most abundant structural variations in plant genomes. In order to deploy these genetic variations for genetic analysis in genus Arachis, we conducted comparative analysis of the draft genome assemblies of both the diploid progenitor species of cultivated tetraploid groundnut (Arachis hypogaea L.) i.e., Arachis duranensis (A subgenome) and Arachis ipaënsis (B subgenome) and identified 515,223 InDels. These InDels include 269,973 insertions identified in A. ipaënsis against A. duranensis while 245,250 deletions in A. duranensis against A. ipaënsis. The majority of the InDels were of single bp (43.7%) and 2–10 bp (39.9%) while the remaining were >10 bp (16.4%). Phylogenetic analysis using genotyping data for 86 (40.19%) polymorphic markers grouped 96 diverse Arachis accessions into eight clusters mostly by the affinity of their genome. This study also provided evidence for the existence of “K” genome, although distinct from both the “A” and “B” genomes, but more similar to “B” genome. The complete homology between A. monticola and A. hypogaea tetraploid taxa showed a very similar genome composition. The above analysis has provided greater insights into the phylogenetic relationship among accessions, genomes, sub species and sections. These InDel markers are very useful resource for groundnut research community for genetic analysis and breeding applications. PMID:29312366
Structural Divergence in Vertebrate Phylogeny of a Duplicated Prototype Galectin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhat, R.; Chakraborty, M.; Mian, I. S.

Prototype galectins, endogenously expressed animal lectins with a single carbohydrate recognition domain, are well-known regulators of tissue properties such as growth and adhesion. The earliest discovered and best studied of the prototype galectins is Galectin-1 (Gal-1). In the Gallus gallus (chicken) genome, Gal-1 is represented by two homologs: Gal-1A and Gal-1B, with distinct biochemical properties, tissue expression, and developmental functions. We investigated the origin of the Gal-1A/Gal-1B divergence to gain insight into when their developmental functions originated and how they could have contributed to vertebrate phenotypic evolution. Sequence alignment and phylogenetic tree construction showed that the Gal-1A/Gal-1B divergence can bemore » traced back to the origin of the sauropsid lineage (consisting of extinct and extant reptiles and birds) although lineage-specific duplications also occurred in the amphibian and actinopterygian genomes. Gene synteny analysis showed that sauropsid gal-1b (the gene for Gal-1B) and its frog and actinopterygian gal-1 homologs share a similar chromosomal location, whereas sauropsid gal-1a has translocated to a new position. Surprisingly, we found that chicken Gal-1A, encoded by the translocated gal-1a, was more similar in its tertiary folding pattern than Gal-1B, encoded by the untranslocated gal-1b, to experimentally determined and predicted folds of nonsauropsid Gal-1s. This inference is consistent with our finding of a lower proportion of conserved residues in sauropsid Gal-1Bs, and evidence for positive selection of sauropsid gal-1b, but not gal-1a genes. We propose that the duplication and structural divergence of Gal-1B away from Gal-1A led to specialization in both expression and function in the sauropsid lineage.« less
Structural Divergence in Vertebrate Phylogeny of a Duplicated Prototype Galectin

DOE PAGES

Bhat, R.; Chakraborty, M.; Mian, I. S.; ...

2014-09-25

Prototype galectins, endogenously expressed animal lectins with a single carbohydrate recognition domain, are well-known regulators of tissue properties such as growth and adhesion. The earliest discovered and best studied of the prototype galectins is Galectin-1 (Gal-1). In the Gallus gallus (chicken) genome, Gal-1 is represented by two homologs: Gal-1A and Gal-1B, with distinct biochemical properties, tissue expression, and developmental functions. We investigated the origin of the Gal-1A/Gal-1B divergence to gain insight into when their developmental functions originated and how they could have contributed to vertebrate phenotypic evolution. Sequence alignment and phylogenetic tree construction showed that the Gal-1A/Gal-1B divergence can bemore » traced back to the origin of the sauropsid lineage (consisting of extinct and extant reptiles and birds) although lineage-specific duplications also occurred in the amphibian and actinopterygian genomes. Gene synteny analysis showed that sauropsid gal-1b (the gene for Gal-1B) and its frog and actinopterygian gal-1 homologs share a similar chromosomal location, whereas sauropsid gal-1a has translocated to a new position. Surprisingly, we found that chicken Gal-1A, encoded by the translocated gal-1a, was more similar in its tertiary folding pattern than Gal-1B, encoded by the untranslocated gal-1b, to experimentally determined and predicted folds of nonsauropsid Gal-1s. This inference is consistent with our finding of a lower proportion of conserved residues in sauropsid Gal-1Bs, and evidence for positive selection of sauropsid gal-1b, but not gal-1a genes. We propose that the duplication and structural divergence of Gal-1B away from Gal-1A led to specialization in both expression and function in the sauropsid lineage.« less
Identifying Specific Genes Controlling Complex Traits Through A Genome-Wide Screen For cis-Acting Regulatory Elements - An Example Using Marek's Disease

USDA-ARS?s Scientific Manuscript database

The identification of specific genes underlying phenotypic variation of complex traits remains one of the greatest challenges in biology despite having genome sequences and more powerful tools. Most genome-wide screens lack sufficient resolving power as they typically depend on linkage. One altern...
Variation across mitochondrial gene trees provides evidence for systematic error: How much gene tree variation is biological?

PubMed

Richards, Emilie J; Brown, Jeremy M; Barley, Anthony J; Chong, Rebecca A; Thomson, Robert C

2018-02-19

The use of large genomic datasets in phylogenetics has highlighted extensive topological variation across genes. Much of this discordance is assumed to result from biological processes. However, variation among gene trees can also be a consequence of systematic error driven by poor model fit, and the relative importance of biological versus methodological factors in explaining gene tree variation is a major unresolved question. Using mitochondrial genomes to control for biological causes of gene tree variation, we estimate the extent of gene tree discordance driven by systematic error and employ posterior prediction to highlight the role of model fit in producing this discordance. We find that the amount of discordance among mitochondrial gene trees is similar to the amount of discordance found in other studies that assume only biological causes of variation. This similarity suggests that the role of systematic error in generating gene tree variation is underappreciated and critical evaluation of fit between assumed models and the data used for inference is important for the resolution of unresolved phylogenetic questions.
Synthesis: Intertwining product and process

NASA Technical Reports Server (NTRS)

Weiss, David M.

1990-01-01

Synthesis is a proposed systematic process for rapidly creating different members of a program family. Family members are described by variations in their requirements. Requirements variations are mapped to variations on a standard design to generate production quality code and documentation. The approach is made feasible by using principles underlying design for change. Synthesis incorporates ideas from rapid prototyping, application generators, and domain analysis. The goals of Synthesis and the Synthesis process are discussed. The technology needed and the feasibility of the approach are also briefly discussed. The status of current efforts to implement Synthesis methodologies is presented.
Maintenance of genetic variation in human personality: Testing evolutionary models by estimating heritability due to common causal variants and investigating the effect of distant inbreeding

PubMed Central

Verweij, Karin J.H.; Yang, Jian; Lahti, Jari; Veijola, Juha; Hintsanen, Mirka; Pulkki-Råback, Laura; Heinonen, Kati; Pouta, Anneli; Pesonen, Anu-Katriina; Widen, Elisabeth; Taanila, Anja; Isohanni, Matti; Miettunen, Jouko; Palotie, Aarno; Penke, Lars; Service, Susan K.; Heath, Andrew C.; Montgomery, Grant W.; Raitakari, Olli; Kähönen, Mika; Viikari, Jorma; Räikkönen, Katri; Eriksson, Johan G; Keltikangas-Järvinen, Liisa; Lehtimäki, Terho; Martin, Nicholas G.; Järvelin, Marjo-Riitta; Visscher, Peter M.; Keller, Matthew C.; Zietsch, Brendan P.

2012-01-01

Personality traits are basic dimensions of behavioural variation, and twin, family, and adoption studies show that around 30% of the between-individual variation is due to genetic variation. There is rapidly-growing interest in understanding the evolutionary basis of this genetic variation. Several evolutionary mechanisms could explain how genetic variation is maintained in traits, and each of these makes predictions in terms of the relative contribution of rare and common genetic variants to personality variation, the magnitude of nonadditive genetic influences, and whether personality is affected by inbreeding. Using genome-wide SNP data from >8,000 individuals, we estimated that little variation in the Cloninger personality dimensions (7.2% on average) is due to the combined effect of common, additive genetic variants across the genome, suggesting that most heritable variation in personality is due to rare variant effects and/or a combination of dominance and epistasis. Furthermore, higher levels of inbreeding were associated with less socially-desirable personality trait levels in three of the four personality dimensions. These findings are consistent with genetic variation in personality traits having been maintained by mutation-selection balance. PMID:23025612

Some links on this page may take you to non-federal websites. Their policies may differ from this site.