Science.gov

Sample records for high sequence variation

  1. High intraindividual variation in internal transcibed spacer sequences in Aeschynanthus (Gesneriaceae): implications for phylogenetics.

    PubMed Central

    Denduangboripant, J; Cronk, Q C

    2000-01-01

    Aeschynanthus (Gesneriaceae) is a large genus of tropical epiphytes that is widely distributed from the Himalayas and China throughout South-East Asia to New Guinea and the Solomon Islands. Polymerase chain reaction (PCR) consensus sequences of the internal transcribed spacers (ITS) of Aeschynanthus nuclear ribosomal DNA showed sequence polymorphism that was difficult to interpret. Cloning individual sequences from the PCR product generated a phylogenetic tree of 23 Aeschynanthus species (two clones per species). The intraindividual clone pairs varied from 0 to 5.01%. We suggest that the high intraindividual sequence variation results from low molecular drive in the ITS of Aeschynanthus. However, this study shows that, despite the variation found within some individuals, it is still possible to use these data to reconstruct phylogenetic relationships of the species, suggesting that clone variation, although persistent, does not pre-date the divergence of Aeschynanthus species. The Aeschynanthus analysis revealed two major clades with different but overlapping geographic distributions and reflected classification based on morphology (particularly seed hair type). PMID:10983824

  2. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    PubMed

    Davies, John K; Harrison, Paul F; Lin, Ya-Hsun; Bartley, Stephanie; Khoo, Chen Ai; Seemann, Torsten; Ryan, Catherine S; Kahler, Charlene M; Hill, Stuart A

    2014-01-01

    Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci) into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1) as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11) are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  3. Comparative Analysis of Mycobacterium tuberculosis pe and ppe Genes Reveals High Sequence Variation and an Apparent Absence of Selective Constraints

    PubMed Central

    McEvoy, Christopher R. E.; Cloete, Ruben; Müller, Borna; Schürch, Anita C.; van Helden, Paul D.; Gagneux, Sebastien; Warren, Robin M.; Gey van Pittius, Nicolaas C.

    2012-01-01

    Mycobacterium tuberculosis complex (MTBC) genomes contain 2 large gene families termed pe and ppe. The function of pe/ppe proteins remains enigmatic but studies suggest that they are secreted or cell surface associated and are involved in bacterial virulence. Previous studies have also shown that some pe/ppe genes are polymorphic, a finding that suggests involvement in antigenic variation. Using comparative sequence analysis of 18 publicly available MTBC whole genome sequences, we have performed alignments of 33 pe (excluding pe_pgrs) and 66 ppe genes in order to detect the frequency and nature of genetic variation. This work has been supplemented by whole gene sequencing of 14 pe/ppe (including 5 pe_pgrs) genes in a cohort of 40 diverse and well defined clinical isolates covering all the main lineages of the M. tuberculosis phylogenetic tree. We show that nsSNP's in pe (excluding pgrs) and ppe genes are 3.0 and 3.3 times higher than in non-pe/ppe genes respectively and that numerous other mutation types are also present at a high frequency. It has previously been shown that non-pe/ppe M. tuberculosis genes display a remarkably low level of purifying selection. Here, we also show that compared to these genes those of the pe/ppe families show a further reduction of selection pressure that suggests neutral evolution. This is inconsistent with the positive selection pressure of “classical” antigenic variation. Finally, by analyzing such a large number of genes we were able to detect large differences in mutation type and frequency between both individual genes and gene sub-families. The high variation rates and absence of selective constraints provides valuable insights into potential pe/ppe function. Since pe/ppe proteins are highly antigenic and have been studied as potential vaccine components these results should also prove informative for aspects of M. tuberculosis vaccine design. PMID:22496726

  4. Comprehensive characterization of human genome variation by high coverage whole-genome sequencing of forty four Caucasians.

    PubMed

    Shen, Hui; Li, Jian; Zhang, Jigang; Xu, Chao; Jiang, Yan; Wu, Zikai; Zhao, Fuping; Liao, Li; Chen, Jun; Lin, Yong; Tian, Qing; Papasian, Christopher J; Deng, Hong-Wen

    2013-01-01

    Whole genome sequencing studies are essential to obtain a comprehensive understanding of the vast pattern of human genomic variations. Here we report the results of a high-coverage whole genome sequencing study for 44 unrelated healthy Caucasian adults, each sequenced to over 50-fold coverage (averaging 65.8×). We identified approximately 11 million single nucleotide polymorphisms (SNPs), 2.8 million short insertions and deletions, and over 500,000 block substitutions. We showed that, although previous studies, including the 1000 Genomes Project Phase 1 study, have catalogued the vast majority of common SNPs, many of the low-frequency and rare variants remain undiscovered. For instance, approximately 1.4 million SNPs and 1.3 million short indels that we found were novel to both the dbSNP and the 1000 Genomes Project Phase 1 data sets, and the majority of which (∼96%) have a minor allele frequency less than 5%. On average, each individual genome carried ∼3.3 million SNPs and ∼492,000 indels/block substitutions, including approximately 179 variants that were predicted to cause loss of function of the gene products. Moreover, each individual genome carried an average of 44 such loss-of-function variants in a homozygous state, which would completely "knock out" the corresponding genes. Across all the 44 genomes, a total of 182 genes were "knocked-out" in at least one individual genome, among which 46 genes were "knocked out" in over 30% of our samples, suggesting that a number of genes are commonly "knocked-out" in general populations. Gene ontology analysis suggested that these commonly "knocked-out" genes are enriched in biological process related to antigen processing and immune response. Our results contribute towards a comprehensive characterization of human genomic variation, especially for less-common and rare variants, and provide an invaluable resource for future genetic studies of human variation and diseases.

  5. Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions (2013 DOE JGI Genomics of Energy and Environment 8th Annual User Meeting)

    SciTech Connect

    Gordon, Sean

    2013-03-01

    Sean Gordon of the USDA on "Natural variation in Brachypodium disctachyon: Deep Sequencing of Highly Diverse Natural Accessions" at the 8th Annual Genomics of Energy & Environment Meeting on March 27, 2013 in Walnut Creek, Calif.

  6. Population structure of African buffalo inferred from mtDNA sequences and microsatellite loci: high variation but low differentiation.

    PubMed

    Simonsen, B T; Siegismund, H R; Arctander, P

    1998-02-01

    The African buffalo (Syncerus caffer) is widespread throughout sub-Saharan Africa and is found in most major vegetation types, wherever permanent sources of water are available, making it physically able to disperse through a wide range of habitats. Despite this, the buffalo has been assumed to be strongly philopatric and to form large aggregations that remain within separate home ranges with little interchange between units, but the level of differentiation within the species is unknown. Genetic differences between populations were assessed using mitochondrial DNA (control region) sequence data and analysis of variation at six microsatellite loci among 11 localities in eastern and southern Africa. High levels of genetic variability were found, suggesting that reported severe population bottlenecks due to outbreak of rinderpest during the last century did not strongly reduce the genetic variability within the species. The high level of genetic variation within the species was found to be evenly distributed among populations and only at the continental level were we able to consistently detect significant differentiation, contrasting with the assumed philopatric behaviour of the buffalo. Results of mtDNA and microsatellite data were found to be congruent, disagreeing with the alleged male-biased dispersal. We propose that the observed pattern of the distribution of genetic variation between buffalo populations at the regional level can be caused by fragmentation of a previous panmictic population due to human activity, and at the continental level, reflects an effect of geographical distance between populations.

  7. Comparative Mitogenomics of the Genus Odontobutis (Perciformes: Gobioidei: Odontobutidae) Revealed Conserved Gene Rearrangement and High Sequence Variations

    PubMed Central

    Ma, Zhihong; Yang, Xuefen; Bercsenyi, Miklos; Wu, Junjie; Yu, Yongyao; Wei, Kaijian; Fan, Qixue; Yang, Ruibin

    2015-01-01

    To understand the molecular evolution of mitochondrial genomes (mitogenomes) in the genus Odontobutis, the mitogenome of Odontobutis yaluensis was sequenced and compared with those of another four Odontobutis species. Our results displayed similar mitogenome features among species in genome organization, base composition, codon usage, and gene rearrangement. The identical gene rearrangement of trnS-trnL-trnH tRNA cluster observed in mitogenomes of these five closely related freshwater sleepers suggests that this unique gene order is conserved within Odontobutis. Additionally, the present gene order and the positions of associated intergenic spacers of these Odontobutis mitogenomes indicate that this unusual gene rearrangement results from tandem duplication and random loss of large-scale gene regions. Moreover, these mitogenomes exhibit a high level of sequence variation, mainly due to the differences of corresponding intergenic sequences in gene rearrangement regions and the heterogeneity of tandem repeats in the control regions. Phylogenetic analyses support Odontobutis species with shared gene rearrangement forming a monophyletic group, and the interspecific phylogenetic relationships are associated with structural differences among their mitogenomes. The present study contributes to understanding the evolutionary patterns of Odontobutidae species. PMID:26492246

  8. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes

    PubMed Central

    Zhuang, Jiali; Weng, Zhiping

    2015-01-01

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs. PMID:26283183

  9. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    PubMed

    Kim, Seokhwi; Lee, Jeeyun; Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

    2014-01-01

    In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  10. Analysis of GADD45A sequence variations in French Canadian families with high risk of breast cancer.

    PubMed

    Desjardins, Sylvie; Ouellette, Geneviève; Labrie, Yvan; Simard, Jacques; Durocher, Francine

    2008-01-01

    GADD45A is an evolutionary conserved gene whose expression is regulated by two major tumor suppressor proteins involved in breast cancer etiology, namely, p53 and BRCA1, and which acts primarily in the control of the G2/M cell-cycle transition, apoptosis, and DNA repair. Following genotoxic stress, the p53 protein activates GADD45A transcription, whereas in absence of DNA damage, BRCA1 represses GADD45A expression through interaction with the zinc finger protein ZNF350. Moreover, BRCA1 can activate GADD45A gene expression through interactions with transcription factors binding to the gene promoter. On the basis of the intricate network of interactions between GADD45A, p53, and BRCA1, and the fact that both BRCA1 or TP53 mutations are involved in breast cancer tumorigenesis, we undertook the characterization of the entire coding sequence, intron/exon boundaries, and p53- and ZNF350-binding sequences of this potential breast cancer susceptibility candidate gene in a sample set of 96 women affected with breast cancer from non-BRCA1 and BRCA2 French Canadian families with a high risk of breast cancer and 95 healthy controls from the same population. Although none of the 12 identified sequence variations show a significant difference in frequency between both sample sets, haplotype phasing and frequency estimations identified a common haplotype displaying a higher frequency among the control group. As the variants present on this particular haplotype are noncoding variants in either intron 2 or 3, this finding will have to be further investigated in larger cohorts and other populations. In this regard, our study also identified tagging single nucleotide polymorphisms (tSNPs), providing useful data for other large-scale association studies.

  11. Transcriptome analysis of the variations between autotetraploid Paulownia tomentosa and its diploid using high-throughput sequencing.

    PubMed

    Fan, Guoqiang; Wang, Limin; Deng, Minjie; Niu, Suyan; Zhao, Zhenli; Xu, Enkai; Cao, Xibin; Zhang, Xiaoshen

    2015-08-01

    Timber properties of autotetraploid Paulownia tomentosa are heritable with whole genome duplication, but the molecular mechanisms for the predominant characteristics remain unclear. To illuminate the genetic basis, high-throughput sequencing technology was used to identify the related unigenes. 2677 unigenes were found to be significantly differentially expressed in autotetraploid P. tomentosa. In total, 30 photosynthesis-related, 21 transcription factor-related, and 22 lignin-related differentially expressed unigenes were detected, and the roles of the peroxidase in lignin biosynthesis, MYB DNA-binding proteins, and WRKY proteins associated with the regulation of relevant hormones are extensively discussed. The results provide transcriptome data that may bring a new perspective to explain the polyploidy mechanism in the long growth cycle of plants and offer some help to the future Paulownia breeding. PMID:25773315

  12. Sequence variation in transcription factor IIIA.

    PubMed Central

    Gaskins, C J; Hanas, J S

    1990-01-01

    Previous studies characterized macromolecular differences between Xenopus and Rana transcription factor IIIA (TFIIIA) (Gaskins et al., 1989, Nucl. Acids Res. 17, 781-794). In the present study, cDNAs for TFIIIA from Xenopus borealis and Rana catesbeiana (American bullfrog) were cloned and sequenced in order to gain molecular insight into the structure, function, and species variation of TFIIIA and the TFIIIA-type zinc finger. X. borealis and R. catesbeiana TFIIIAs have 339 and 335 amino acids respectively, 5 and 9 fewer than X. laevis TFIIIA. X. borealis TFIIIA exhibited 84% sequence homology (55 amino acid differences) with X. laevis TFIIIA and R. catesbeiana TFIIIA exhibited 63% homology (128 amino acid changes) with X. laevis TFIIIA. This sequence variation is not random; the C-terminal halves of these TFIIIAs contain substantially more non-conservative changes than the N-terminal halves. In particular, the N-terminal region of TFIIIA (that region forming strong DNA contacts) is the most conserved and the C-terminal tail (that region involved in transcription promotion) the most divergent. Hydropathy analyses of these sequences revealed zinc finger periodicity in the N-terminal halves, extreme hydrophilicity in the C-terminal halves, and a different C-terminal tail hydropathy for R. catesbeiana TFIIIA. Although considerable sequence variation exists in these TFIIIA zinc fingers, the Cys/His, Tyr/Phe and Leu residues are strictly conserved between X. laevis and X. borealis. Strict conservation of only the Cys/His motif is observed between X. laevis and R. catesbeiana TFIIIA. Overall, Cys/His zinc fingers in TFIIIA are much less conserved than Cys/Cys fingers in erythroid transcription factor (Eryf 1) and also less conserved than homeo box domains in segmentation genes. The collective evidence indicates that TFIIIA evolved from a common precursor containing up to 12 finger domains which subsequently evolved at different rates. Images PMID:2110661

  13. Massively parallel sequencing approaches for characterization of structural variation.

    PubMed

    Koboldt, Daniel C; Larson, David E; Chen, Ken; Ding, Li; Wilson, Richard K

    2012-01-01

    The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.

  14. Transcriptome and genome sequencing uncovers functional variation in humans

    PubMed Central

    Lappalainen, Tuuli; Sammeth, Michael; Friedländer, Marc R; ‘t Hoen, Peter AC; Monlong, Jean; Rivas, Manuel A; Gonzàlez-Porta, Mar; Kurbatova, Natalja; Griebel, Thasso; Ferreira, Pedro G; Barann, Matthias; Wieland, Thomas; Greger, Liliana; van Iterson, Maarten; Almlöf, Jonas; Ribeca, Paolo; Pulyakhina, Irina; Esser, Daniela; Giger, Thomas; Tikhonov, Andrew; Sultan, Marc; Bertier, Gabrielle; MacArthur, Daniel G; Lek, Monkol; Lizano, Esther; Buermans, Henk PJ; Padioleau, Ismael; Schwarzmayr, Thomas; Karlberg, Olof; Ongen, Halit; Kilpinen, Helena; Beltran, Sergi; Gut, Marta; Kahlem, Katja; Amstislavskiy, Vyacheslav; Stegle, Oliver; Pirinen, Matti; Montgomery, Stephen B; Donnelly, Peter; McCarthy, Mark I; Flicek, Paul; Strom, Tim M; Lehrach, Hans; Schreiber, Stefan; Sudbrak, Ralf; Carracedo, Ángel; Antonarakis, Stylianos E; Häsler, Robert; Syvänen, Ann-Christine; van Ommen, Gert-Jan; Brazma, Alvis; Meitinger, Thomas; Rosenstiel, Philip; Guigó, Roderic; Gut, Ivo G; Estivill, Xavier; Dermitzakis, Emmanouil T

    2013-01-01

    Summary Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome. PMID:24037378

  15. Mitochondrial DNA sequence variation in multiple sclerosis

    PubMed Central

    Santaniello, Adam; Caillier, Stacy J.; D'Alfonso, Sandra; Martinelli Boneschi, Filippo; Hauser, Stephen L.; Oksenberg, Jorge R.

    2015-01-01

    Objective: To assess the influence of common mitochondrial DNA (mtDNA) sequence variation on multiple sclerosis (MS) risk in cases and controls part of an international consortium. Methods: We analyzed 115 high-quality mtDNA variants and common haplogroups from a previously published genome-wide association study among 7,391 cases from the International Multiple Sclerosis Genetics Consortium and 14,568 controls from the Wellcome Trust Case Control Consortium 2 project from 7 countries. Significant single nucleotide polymorphism and haplogroup associations were replicated in 3,720 cases and 879 controls from the University of California, San Francisco. Results: An elevated risk of MS was detected among haplogroup JT carriers from 7 pooled clinic sites (odds ratio [OR] = 1.15, 95% confidence interval [CI] = 1.07–1.24, p = 0.0002) included in the discovery study. The increased risk of MS was observed for both haplogroup T (OR = 1.17, 95% CI = 1.06–1.29, p = 0.002) and haplogroup J carriers (OR = 1.11, 95% CI = 1.01–1.22, p = 0.03). These haplogroup associations with MS were not replicated in the independent sample set. An elevated risk of primary progressive (PP) MS was detected for haplogroup J participants from 3 European discovery populations (OR = 1.49, 95% CI = 1.10–2.01, p = 0.009). This elevated risk was borderline significant in the US replication population (OR = 1.43, 95% CI = 0.99–2.08, p = 0.058) and remained significant in pooled analysis of discovery and replication studies (OR = 1.43, 95% CI = 1.14–1.81, p = 0.002). No common individual mtDNA variants were associated with MS risk. Conclusions: Identification and validation of mitochondrial genetic variants associated with MS and PPMS may lead to new targets for treatment and diagnostic tests for identifying potential responders to interventions that target mitochondria. PMID:26136518

  16. Sequence Variation Within the Fragile X Locus

    PubMed Central

    Mathews, Debra J.; Kashuk, Carl; Brightwell, Gale; Eichler, Evan E.; Chakravarti, Aravinda

    2001-01-01

    The human genome provides a reference sequence, which is a template for resequencing studies that aim to discover and interpret the record of common ancestry that exists in extant genomes. To understand the nature and pattern of variation and linkage disequilibrium comprising this history, we present a study of ∼31 kb spanning an ∼70 kb region of FMR1, sequenced in a sample of 20 humans (worldwide sample) and four great apes (chimp, bonobo, and gorilla). Twenty-five polymorphic sites and two insertion/deletions, distributed in 11 unique haplotypes, were identified among humans. Africans are the only geographic group that do not share any haplotypes with other groups. Parsimony analysis reveals two main clades and suggests that the four major human geographic groups are distributed throughout the phylogenetic tree and within each major clade. An African sample appears to be most closely related to the common ancestor shared with the three other geographic groups. Nucleotide diversity, π, for this sample is 2.63 ± 6.28 × 10−4. The mutation rate, μ, is 6.48 × 10−10 per base pair per year, giving an ancestral population size of ∼6200 and a time to the most recent common ancestor of ∼320,000 ± 72,000 per base pair per year. Linkage disequilibrium (LD) at the FMR1 locus, evaluated by conventional LD analysis and by the length of segment shared between any two chromosomes, is extensive across the region. PMID:11483579

  17. Understanding mechanisms underlying human gene expression variation with RNA sequencing

    PubMed Central

    Pickrell, Joseph K.; Marioni, John C.; Pai, Athma A.; Degner, Jacob F.; Engelhardt, Barbara E.; Nkadori, Everlyne; Veyrieras, Jean-Baptiste; Stephens, Matthew; Gilad, Yoav; Pritchard, Jonathan K.

    2011-01-01

    Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project2. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals. PMID:20220758

  18. Identification of Sequence Variation in the Apolipoprotein A2 Gene and Their Relationship with Serum High-Density Lipoprotein Cholesterol Levels

    PubMed Central

    Bandarian, Fatemeh; Daneshpour, Maryam Sadat; Hedayati, Mehdi; Naseri, Mohsen; Azizi, Fereidoun

    2016-01-01

    Background: Apolipoprotein A2 (APOA2) is the second major apolipoprotein of the high-density lipoprotein cholesterol (HDL-C). The study aim was to identify APOA2 gene variation in individuals within two extreme tails of HDL-C levels and its relationship with HDL-C level. Methods: This cross-sectional survey was conducted on participants from Tehran Glucose and Lipid Study (TLGS) at Research Institute for Endocrine Sciences, Tehran, Iran from April 2012 to February 2013. In total, 79 individuals with extreme low HDL-C levels (≤5th percentile for age and gender) and 63 individuals with extreme high HDL-C levels (≥95th percentile for age and gender) were selected. Variants were identified using DNA amplification and direct sequencing. Results: Screen of all exons and the core promoter region of APOA2 gene identified nine single nucleotide substitutions and one microsatellite; five of which were known and four were new variants. Of these nine variants, two were common tag single nucleotide polymorphisms (SNPs) and seven were rare SNPs. Both exonic substitutions were missense mutations and caused an amino acid change. There was a significant association between the new missense mutation (variant Chr.1:16119226, Ala98Pro) and HDL-C level. Conclusion: None of two common tag SNPs of rs6413453 and rs5082 contributes to the HDL-C trait in Iranian population, but a new missense mutation in APOA2 in our population has a significant association with HDL-C. PMID:26590203

  19. High-Throughput Sequencing Technologies

    PubMed Central

    Reuter, Jason A.; Spacek, Damek; Snyder, Michael P.

    2015-01-01

    Summary The human genome sequence has profoundly altered our understanding of biology, human diversity and disease. The path from the first draft sequence to our nascent era of personal genomes and genomic medicine has been made possible only because of the extraordinary advancements in DNA sequencing technologies over the past ten years. Here, we discuss commonly used high-throughput sequencing platforms, the growing array of sequencing assays developed around them as well as the challenges facing current sequencing platforms and their clinical application. PMID:26000844

  20. Variations on strongly lacunary quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Kaplan, Huseyin; Cakalli, Huseyin

    2016-08-01

    We introduce a new function space, namely the space of Nθ (p)-ward continuous functions, which turns out to be a closed subspace of the space of continuous functions for each positive integer p. Nθα(p ) -ward continuity is also introduced and investigated for any fixed 0 < α ≤ 1, and for any fixed positive integer p. A real valued function f defined on a subset A of R, the set of real numbers is Nθα(p ) -ward continuous if it preserves Nθα(p ) -quasi-Cauchy sequences, i.e. (f (xn)) is an Nθα(p ) -quasi-Cauchy sequence whenever (xn) is Nθα(p ) -quasi-Cauchy sequence of points in A, where a sequence (xk) of points in R is called Nθα(p ) -quasi-Cauchy if lim r →∞ 1/hrα ∑k ∈Ir |Δ xk | p =0 , where Δxk = xk+1-xk for each positive integer k, p is a fixed positive integer, α is fixed in ]0, 1], Ir = (kr-1, kr], and θ = (kr) is a lacunary sequence, i.e. an increasing sequence of positive integers such that k0 ≠ 0, and hr: kr-kr-1 →∞.

  1. Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

    PubMed

    Wood, Henry M; Belvedere, Ornella; Conway, Caroline; Daly, Catherine; Chalkley, Rebecca; Bickerdike, Melissa; McKinley, Claire; Egan, Phil; Ross, Lisa; Hayward, Bruce; Morgan, Joanne; Davidson, Leslie; MacLennan, Ken; Ong, Thian K; Papagiannopoulos, Kostas; Cook, Ian; Adams, David J; Taylor, Graham R; Rabbitts, Pamela

    2010-08-01

    The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.

  2. Genome nucleotide composition shapes variation in simple sequence repeats.

    PubMed

    Tian, Xiangjun; Strassmann, Joan E; Queller, David C

    2011-02-01

    Simple sequence repeats (SSRs) or microsatellites are a common component of genomes but vary greatly across species in their abundance. We tested the hypothesis that this variation is due in part to AT/GC content of genomes, with genomes biased toward either high AT or high CG generating more short random repeats that are long enough to enhance expansion through slippage during replication. To test this hypothesis, we identified repeats with perfect tandem iterations of 1-6 bp from 25 protists with complete or near-complete genome sequences. As expected, the density and the frequency are highly related to genome AT content, with excellent fits to quadratic regressions with minima near a 50% AT content and rising toward both extremes. Within species, the same trends hold, except the limited variation in AT content within each species places each mainly on the descending (GC rich), middle, or ascending (AT rich) part of the curve. The base usages of repeat motifs are also significantly correlated with genome nucleotide compositions: Percentages of AT-rich motifs rise with the increase of genome AT content but vice versa for GC-rich subgroups. Amino acid homopolymer repeats also show the expected quadratic relationship, with higher abundance in species with AT content biased in either direction. Our results show that genome nucleotide composition explains up to half of the variance in the abundance and motif constitution of SSRs.

  3. A variation on lacunary quasi Cauchy sequences

    NASA Astrophysics Data System (ADS)

    Cakalli, Huseyin; Et, Mikail; Sengul, Hacer

    2016-08-01

    In the present paper, we introduce a concept of ideal lacunary statistical quasi-Cauchy sequence of order α of real numbers in the sense that a sequence (xk) of points in R is called I-lacunary statistically quasi-Cauchy of order α, if { r ∈N :1/hrα | { k ∈Ir:| Δ xk | ≥ɛ } | ≥δ } ∈I for each ɛ > 0 and for each δ > 0, where an ideal I is a family of subsets of positive integers N which is closed under taking finite unions and subsets of its elements. The main purpose of this paper is to investigate ideal lacunary statistical ward continuity of order α, where a function f is called I- lacunary statistically ward continuous of order α if it preserves I-lacunary statistically quasi-Cauchy sequences of order α, i.e. (f (xn)) is a Sθα(I ) -quasi-Cauchy sequence whenever (xn) is.

  4. Algorithm of detecting structural variations in DNA sequences

    NASA Astrophysics Data System (ADS)

    Nałecz-Charkiewicz, Katarzyna; Nowak, Robert

    2014-11-01

    Whole genome sequencing enables to use the longest common subsequence algorithm to detect genetic structure variations. We propose to search position of short unique fragments, genetic markers, to achieve acceptable time and space complexity. The markers are generated by algorithms searching the genetic sequence or its Fourier transformation. The presented methods are checked on structural variations generated in silico on bacterial genomes giving the comparable or better results than other solutions.

  5. Identification of cis-regulatory sequence variations in individual genome sequences.

    PubMed

    Worsley-Hunt, Rebecca; Bernard, Virginie; Wasserman, Wyeth W

    2011-01-01

    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.

  6. Identification of cis-regulatory sequence variations in individual genome sequences

    PubMed Central

    2011-01-01

    Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing. PMID:21989199

  7. Mitochondrial DNA sequence variation in Greeks.

    PubMed

    Kouvatsi, A; Karaiskou, N; Apostolidis, A; Kirmizidis, G

    2001-12-01

    Mitochondrial DNA (mtDNA) control region sequences were determined in 54 unrelated Greeks, coming from different regions in Greece, for both segments HVR-I and HVR-II. Fifty-two different mtDNA haplotypes were revealed, one of which was shared by three individuals. A very low heterogeneity was found among Greek regions. No one cluster of lineages was specific to individuals coming from a certain region. The average pairwise difference distribution showed a value of 7.599. The data were compared with that for other European or neighbor populations (British, French, Germans, Tuscans, Bulgarians, and Turks). The genetic trees that were constructed revealed homogeneity between Europeans. Median networks revealed that most of the Greek mtDNA haplotypes are clustered to the five known haplogroups and that a number of haplotypes are shared among Greeks and other European and Near Eastern populations.

  8. Using chaos to generate variations on movement sequences

    NASA Astrophysics Data System (ADS)

    Bradley, Elizabeth; Stuart, Joshua

    1998-12-01

    We describe a method for introducing variations into predefined motion sequences using a chaotic symbol-sequence reordering technique. A progression of symbols representing the body positions in a dance piece, martial arts form, or other motion sequence is mapped onto a chaotic trajectory, establishing a symbolic dynamics that links the movement sequence and the attractor structure. A variation on the original piece is created by generating a trajectory with slightly different initial conditions, inverting the mapping, and using special corpus-based graph-theoretic interpolation schemes to smooth any abrupt transitions. Sensitive dependence guarantees that the variation is different from the original; the attractor structure and the symbolic dynamics guarantee that the two resemble one another in both aesthetic and mathematical senses.

  9. Unraveling genomic variation from next generation sequencing data.

    PubMed

    Pavlopoulos, Georgios A; Oulas, Anastasis; Iacucci, Ernesto; Sifrim, Alejandro; Moreau, Yves; Schneider, Reinhard; Aerts, Jan; Iliopoulos, Ioannis

    2013-01-01

    Elucidating the content of a DNA sequence is critical to deeper understand and decode the genetic information for any biological system. As next generation sequencing (NGS) techniques have become cheaper and more advanced in throughput over time, great innovations and breakthrough conclusions have been generated in various biological areas. Few of these areas, which get shaped by the new technological advances, involve evolution of species, microbial mapping, population genetics, genome-wide association studies (GWAs), comparative genomics, variant analysis, gene expression, gene regulation, epigenetics and personalized medicine. While NGS techniques stand as key players in modern biological research, the analysis and the interpretation of the vast amount of data that gets produced is a not an easy or a trivial task and still remains a great challenge in the field of bioinformatics. Therefore, efficient tools to cope with information overload, tackle the high complexity and provide meaningful visualizations to make the knowledge extraction easier are essential. In this article, we briefly refer to the sequencing methodologies and the available equipment to serve these analyses and we describe the data formats of the files which get produced by them. We conclude with a thorough review of tools developed to efficiently store, analyze and visualize such data with emphasis in structural variation analysis and comparative genomics. We finally comment on their functionality, strengths and weaknesses and we discuss how future applications could further develop in this field.

  10. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.).

    PubMed

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection.

  11. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.)

    PubMed Central

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection. PMID:26442008

  12. A case study on the genetic origin of the high oleic acid trait through FAD2-1 DNA sequence variation in safflower (Carthamus tinctorius L.).

    PubMed

    Rapson, Sara; Wu, Man; Okada, Shoko; Das, Alpana; Shrestha, Pushkar; Zhou, Xue-Rong; Wood, Craig; Green, Allan; Singh, Surinder; Liu, Qing

    2015-01-01

    The safflower (Carthamus tinctorius L.) is considered a strongly domesticated species with a long history of cultivation. The hybridization of safflower with its wild relatives has played an important role in the evolution of cultivars and is of particular interest with regards to their production of high quality edible oils. Original safflower varieties were all rich in linoleic acid, while varieties rich in oleic acid have risen to prominence in recent decades. The high oleic acid trait is controlled by a partially recessive allele ol at a single locus OL. The ol allele was found to be a defective microsomal oleate desaturase FAD2-1. Here we present DNA sequence data and Southern blot analysis suggesting that there has been an ancient hybridization and introgression of the FAD2-1 gene into C. tinctorius from its wild relative C. palaestinus. It is from this gene that FAD2-1Δ was derived more recently. Identification and characterization of the genetic origin and diversity of FAD2-1 could aid safflower breeders in reducing population size and generations required for the development of new high oleic acid varieties by using perfect molecular marker-assisted selection. PMID:26442008

  13. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    PubMed Central

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  14. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes.

  15. Mapping copy number variation by population-scale genome sequencing.

    PubMed

    Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

    2011-02-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.

  16. A sequence-based variation map of zebrafish.

    PubMed

    Patowary, Ashok; Purkanti, Ramya; Singh, Meghna; Chauhan, Rajendra; Singh, Angom Ramcharan; Swarnkar, Mohit; Singh, Naresh; Pandey, Vikas; Torroja, Carlos; Clark, Matthew D; Kocher, Jean-Pierre; Clark, Karl J; Stemple, Derek L; Klee, Eric W; Ekker, Stephen C; Scaria, Vinod; Sivasubbu, Sridhar

    2013-03-01

    Zebrafish (Danio rerio) is a popular vertebrate model organism largely deployed using outbred laboratory animals. The nonisogenic nature of the zebrafish as a model system offers the opportunity to understand natural variations and their effect in modulating phenotype. In an effort to better characterize the range of natural variation in this model system and to complement the zebrafish reference genome project, the whole genome sequence of a wild zebrafish at 39-fold genome coverage was determined. Comparative analysis with the zebrafish reference genome revealed approximately 5.2 million single nucleotide variations and over 1.6 million insertion-deletion variations. This dataset thus represents a new catalog of genetic variations in the zebrafish genome. Further analysis revealed selective enrichment for variations in genes involved in immune function and response to the environment, suggesting genome-level adaptations to environmental niches. We also show that human disease gene orthologs in the sequenced wild zebrafish genome show a lower ratio of nonsynonymous to synonymous single nucleotide variations.

  17. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-12-21

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.

  18. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-01-01

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2. PMID:26782391

  19. Dissecting the relationship between protein structure and sequence variation

    NASA Astrophysics Data System (ADS)

    Shahmoradi, Amir; Wilke, Claus; Wilke Lab Team

    2015-03-01

    Over the past decade several independent works have shown that some structural properties of proteins are capable of predicting protein evolution. The strength and significance of these structure-sequence relations, however, appear to vary widely among different proteins, with absolute correlation strengths ranging from 0 . 1 to 0 . 8 . Here we present the results from a comprehensive search for the potential biophysical and structural determinants of protein evolution by studying more than 200 structural and evolutionary properties in a dataset of 209 monomeric enzymes. We discuss the main protein characteristics responsible for the general patterns of protein evolution, and identify sequence divergence as the main determinant of the strengths of virtually all structure-evolution relationships, explaining ~ 10 - 30 % of observed variation in sequence-structure relations. In addition to sequence divergence, we identify several protein structural properties that are moderately but significantly coupled with the strength of sequence-structure relations. In particular, proteins with more homogeneous back-bone hydrogen bond energies, large fractions of helical secondary structures and low fraction of beta sheets tend to have the strongest sequence-structure relation. BEACON-NSF center for the study of evolution in action.

  20. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads.

    PubMed

    Moncunill, Valentí; Gonzalez, Santi; Beà, Sílvia; Andrieux, Lise O; Salaverria, Itziar; Royo, Cristina; Martinez, Laura; Puiggròs, Montserrat; Segura-Wang, Maia; Stütz, Adrian M; Navarro, Alba; Royo, Romina; Gelpí, Josep L; Gut, Ivo G; López-Otín, Carlos; Orozco, Modesto; Korbel, Jan O; Campo, Elias; Puente, Xose S; Torrents, David

    2014-11-01

    The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer. PMID:25344728

  1. Flagellin gene sequence variation in the genus Pseudomonas.

    PubMed

    Bellingham, N F; Morgan, J A; Saunders, J R; Winstanley, C

    2001-07-01

    Flagellin gene (fliC) sequences from 18 strains of Pseudomonas sensu stricto representing 8 different species, and 9 representative fliC sequences from other members of the gamma sub-division of proteobacteria, were compared. Analysis was performed on N-terminal, C-terminal and whole fliC sequences. The fliC analyses confirmed the inferred relationship between P. mendocina, P. oleovorans and P. aeruginosa based on 16S rRNA sequence comparisons. In addition, the analyses indicated that P. putida PRS2000 was closely related to P. fluorescens SBW25 and P. fluorescens NCIMB 9046T, but suggested that P. putida PaW8 and P. putida PRS2000 were more closely related to other Pseudomonas spp. than they were to each other. There were a number of inconsistencies in inferred evolutionary relationships between strains, depending on the analysis performed. In particular, whole flagellin gene comparisons often differed from those obtained using N- and C-terminal sequences. However, there were also inconsistencies between the terminal region analyses, suggesting that phylogenetic relationships inferred on the basis of fliC sequence should be treated with caution. Although the central domain of fliC is highly variable between Pseudomonas strains, there was evidence of sequence similarities between the central domains of different Pseudomonas fliC sequences. This indicates the possibility of recombination in the central domain of fliC genes within Pseudomonas species, and between these genes and those from other bacteria. PMID:11518318

  2. Genetic sequence variations of BRCA1-interacting genes AURKA, BAP1, BARD1 and DHX9 in French Canadian families with high risk of breast cancer.

    PubMed

    Guénard, Frédéric; Labrie, Yvan; Ouellette, Geneviève; Beauparlant, Charles Joly; Durocher, Francine

    2009-03-01

    Breast cancer is a heterogeneous disease displaying some degree of familial clustering. Highly penetrant breast cancer susceptibility genes represent approximately 20-25% of the familial aggregation of breast cancer. A significant proportion of this familial aggregation of breast cancer is thus yet to be explained by other breast cancer susceptibility genes. Given the high susceptibility conferred by the two major breast cancer predisposition genes, BRCA1 and BRCA2 and the implication of these genes in many key cellular processes, assessment of genes encoding BRCA1-interacting proteins as plausible breast cancer candidate genes is thus attractive. In this study, four genes encoding BRCA1-interacting proteins were analyzed in a cohort of 96 breast cancer individuals from high-risk non-BRCA1/BRCA2 French Canadian families. Although no deleterious truncating germline mutations or aberrant spliced mRNA species were identified, a total of 10, 4, 11 and 6 variants were found in the AURKA, BAP1, BARD1 and DHX9 genes, respectively. The allele frequency of each variant was further ascertained in a cohort of 98 healthy French Canadian unrelated women and a difference in allele frequency was observed for one BARD1 variant based on single-marker analysis. Haplotype estimation, haplotype blocks and tagging SNPs identification were then performed for each gene, providing a valuable tool for further searches of common disease-associated variants in these genes and therefore further analyses on these genes in larger cohorts is warranted in the search of low-to-moderate penetrance breast cancer susceptibility alleles.

  3. STR allele sequence variation: Current knowledge and future issues.

    PubMed

    Gettings, Katherine Butler; Aponte, Rachel A; Vallone, Peter M; Butler, John M

    2015-09-01

    This article reviews what is currently known about short tandem repeat (STR) allelic sequence variation in and around the twenty-four loci most commonly used throughout the world to perform forensic DNA investigations. These STR loci include D1S1656, TPOX, D2S441, D2S1338, D3S1358, FGA, CSF1PO, D5S818, SE33, D6S1043, D7S820, D8S1179, D10S1248, TH01, vWA, D12S391, D13S317, Penta E, D16S539, D18S51, D19S433, D21S11, Penta D, and D22S1045. All known reported variant alleles are compiled along with genomic information available from GenBank, dbSNP, and the 1000 Genomes Project. Supplementary files are included which provide annotated reference sequences for each STR locus, characterize genomic variation around the STR repeat region, and compare alleles present in currently available STR kit allelic ladders. Looking to the future, STR allele nomenclature options are discussed as they relate to next generation sequencing efforts underway. PMID:26197946

  4. DYZ1 arrays show sequence variation between the monozygotic males

    PubMed Central

    2014-01-01

    Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors. PMID:24495361

  5. Comparative RNA sequencing reveals substantial genetic variation in endangered primates

    PubMed Central

    Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

    2012-01-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615

  6. Targeted capture enrichment and sequencing identifies extensive nucleotide variation in the turkey MHC-B.

    PubMed

    Reed, Kent M; Mendoza, Kristelle M; Settlage, Robert E

    2016-03-01

    Variation in the major histocompatibility complex (MHC) is increasingly associated with disease susceptibility and resistance in avian species of agricultural importance. This variation includes sequence polymorphisms but also structural differences (gene rearrangement) and copy number variation (CNV). The MHC has now been described for multiple galliform species including the best defined assemblies of the chicken (Gallus gallus) and domestic turkey (Meleagris gallopavo). Using this sequence resource, this study applied high-throughput sequencing to investigate MHC variation in turkeys of North America (NA turkeys). An MHC-specific SureSelect (Agilent) capture array was developed, and libraries were created for 14 turkeys representing domestic (commercial bred), heritage breed, and wild turkeys. In addition, a representative of the Ocellated turkey (M. ocellata) and chicken (G. gallus) was included to test cross-species applicability of the capture array allowing for identification of new species-specific polymorphisms. Libraries were hybridized to ∼12 K cRNA baits and the resulting pools were sequenced. On average, 98% of processed reads mapped to the turkey whole genome sequence and 53% to the MHC target. In addition to the MHC, capture hybridization recovered sequences corresponding to other MHC regions. Sequence alignment and de novo assembly indicated the presence of several additional BG genes in the turkey with evidence for CNV. Variant detection identified an average of 2245 polymorphisms per individual for the NA turkeys, 3012 for the Ocellated turkey, and 462 variants in the chicken (RJF-256). This study provides an extensive sequence resource for examining MHC variation and its relation to health of this agriculturally important group of birds.

  7. Variational formulation of high performance finite elements: Parametrized variational principles

    NASA Technical Reports Server (NTRS)

    Felippa, Carlos A.; Militello, Carmello

    1991-01-01

    High performance elements are simple finite elements constructed to deliver engineering accuracy with coarse arbitrary grids. This is part of a series on the variational basis of high-performance elements, with emphasis on those constructed with the free formulation (FF) and assumed natural strain (ANS) methods. Parametrized variational principles that provide a foundation for the FF and ANS methods, as well as for a combination of both are presented.

  8. The Quantification of Representative Sequences pipeline for amplicon sequencing: case study on within-population ITS1 sequence variation in a microparasite infecting Daphnia.

    PubMed

    González-Tortuero, E; Rusek, J; Petrusek, A; Gießler, S; Lyras, D; Grath, S; Castro-Monzón, F; Wolinska, J

    2015-11-01

    Next generation sequencing (NGS) platforms are replacing traditional molecular biology protocols like cloning and Sanger sequencing. However, accuracy of NGS platforms has rarely been measured when quantifying relative frequencies of genotypes or taxa within populations. Here we developed a new bioinformatic pipeline (QRS) that pools similar sequence variants and estimates their frequencies in NGS data sets from populations or communities. We tested whether the estimated frequency of representative sequences, generated by 454 amplicon sequencing, differs significantly from that obtained by Sanger sequencing of cloned PCR products. This was performed by analysing sequence variation of the highly variable first internal transcribed spacer (ITS1) of the ichthyosporean Caullerya mesnili, a microparasite of cladocerans of the genus Daphnia. This analysis also serves as a case example of the usage of this pipeline to study within-population variation. Additionally, a public Illumina data set was used to validate the pipeline on community-level data. Overall, there was a good correspondence in absolute frequencies of C. mesnili ITS1 sequences obtained from Sanger and 454 platforms. Furthermore, analyses of molecular variance (amova) revealed that population structure of C. mesnili differs across lakes and years independently of the sequencing platform. Our results support not only the usefulness of amplicon sequencing data for studies of within-population structure but also the successful application of the QRS pipeline on Illumina-generated data. The QRS pipeline is freely available together with its documentation under GNU Public Licence version 3 at http://code.google.com/p/quantification-representative-sequences.

  9. Geochemical variations during the 2012 Emilia seismic sequence

    NASA Astrophysics Data System (ADS)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    , apart one sample, are not thermally anomalous. Stable isotopes of H and O point out the absence of mixing with connate waters, prolonged interaction with the host-rock at high temperature and/or heavy gas-water exchange at depth. Isotopic carbon composition emphasizes its organic (i.e. shallow) origin; only "La Canonica" site, the deepest well sampled in this study, shows a probable deep(er) provenance of dissolved carbon. Waters trend away from the atmospheric end-member composition, dissolving CO2 or CH4 depending on their redox state. Dissolved radon activity is very low, likely due to the particular hydrogeological setting of the study area (i.e. the presence of waters with long residence times in the considered aquifers). Obtained results highlight a different behavior before and after the seismic events, proved also by the different carbon isotopic signature of CH4. These variations could be produced by increasing of bacterial (e.g. peat strata) and methanogenic fermentation processes in the first meters of the soil.

  10. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  11. Intraspecific genetic variation in Paramecium revealed by mitochondrial cytochrome C oxidase I sequences.

    PubMed

    Barth, Dana; Krenek, Sascha; Fokin, Sergei I; Berendonk, Thomas U

    2006-01-01

    Studies of intraspecific genetic diversity of ciliates, such as population genetics and biogeography, are particularly hampered by the lack of suitable DNA markers. For example, sequences of the non-coding ribosomal internal transcribed spacer (ITS) regions are often too conserved for intraspecific analyses. We have therefore identified primers for the mitochondrial cytochrome c oxidase I (COI) gene and applied them for intraspecific investigations in Paramecium caudatum and Paramecium multimicronucleatum. Furthermore, we obtained sequences of the ITS regions from the same strains and carried out comparative sequence analyses of both data sets. The mitochondrial sequences revealed substantially higher variation in both Paramecium species, with intraspecific divergences up to 7% in P. caudatum and 9.5% in P. multimicronucleatum. Moreover, an initial survey of the population structure discovered different mitochondrial haplotypes of P. caudatum in one pond, thereby demonstrating the potential of this genetic marker for population genetic analyses. Our primers successfully amplified the COI gene of other Paramecium. This is the first report of intraspecific variation in free-living protozoans based on mitochondrial sequence data. Our results show that the high variation in mitochondrial DNA makes it a suitable marker for intraspecific and population genetic studies.

  12. Qualifying high-throughput immune repertoire sequencing.

    PubMed

    Niklas, Norbert; Pröll, Johannes; Weinberger, Johannes; Zopf, Agnes; Wiesinger, Karin; Krismer, Konstantin; Bettelheim, Peter; Gabriel, Christian

    2014-01-01

    Diversity of B and T cell receptors, achieved by gene recombination and somatic hypermutation, allows the immune system for recognition and targeted reaction against various threats. Next-generation sequencing for assessment of a cell's gene composition and variation makes deep analysis of one individual's immune spectrum feasible. An easy to apply but detailed analysis and visualization strategy is necessary to process all sequences generated. We performed sequencing utilizing the 454 system for CLL and control samples, utilized the IMGT database and applied the presented analysis tools. With the applied protocol, malignant clones are found and characterized, mutational status compared to germline identity is elaborated in detail showing that the CLL mutation status is not as monoclonal as generally thought. On the other hand, this strategy is not solely applicable to the 454 sequencing system but can easily be transferred to any other next-generation sequencing platform. PMID:24607567

  13. Amino acid repeats cause extraordinary coding sequence variation in the social amoeba Dictyostelium discoideum.

    PubMed

    Scala, Clea; Tian, Xiangjun; Mehdiabadi, Natasha J; Smith, Margaret H; Saxer, Gerda; Stephens, Katie; Buzombo, Prince; Strassmann, Joan E; Queller, David C

    2012-01-01

    Protein sequences are normally the most conserved elements of genomes owing to purifying selection to maintain their functions. We document an extraordinary amount of within-species protein sequence variation in the model eukaryote Dictyostelium discoideum stemming from triplet DNA repeats coding for long strings of single amino acids. D. discoideum has a very large number of such strings, many of which are polyglutamine repeats, the same sequence that causes various human neurological disorders in humans, like Huntington's disease. We show here that D. discoideum coding repeat loci are highly variable among individuals, making D. discoideum a candidate for the most variable proteome. The coding repeat loci are not significantly less variable than similar non-coding triplet repeats. This pattern is consistent with these amino-acid repeats being largely non-functional sequences evolving primarily by mutation and drift. PMID:23029418

  14. Natural Allelic Variations in Highly Polyploidy Saccharum Complex

    PubMed Central

    Song, Jian; Yang, Xiping; Resende, Marcio F. R.; Neves, Leandro G.; Todd, James; Zhang, Jisen; Comstock, Jack C.; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  15. Natural Allelic Variations in Highly Polyploidy Saccharum Complex.

    PubMed

    Song, Jian; Yang, Xiping; Resende, Marcio F R; Neves, Leandro G; Todd, James; Zhang, Jisen; Comstock, Jack C; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes.

  16. Natural Allelic Variations in Highly Polyploidy Saccharum Complex.

    PubMed

    Song, Jian; Yang, Xiping; Resende, Marcio F R; Neves, Leandro G; Todd, James; Zhang, Jisen; Comstock, Jack C; Wang, Jianping

    2016-01-01

    Sugarcane (Saccharum spp.) is an important sugar and biofuel crop with high polyploid and complex genomes. The Saccharum complex, comprised of Saccharum genus and a few related genera, are important genetic resources for sugarcane breeding. A large amount of natural variation exists within the Saccharum complex. Though understanding their allelic variation has been challenging, it is critical to dissect allelic structure and to identify the alleles controlling important traits in sugarcane. To characterize natural variations in Saccharum complex, a target enrichment sequencing approach was used to assay 12 representative germplasm accessions. In total, 55,946 highly efficient probes were designed based on the sorghum genome and sugarcane unigene set targeting a total of 6 Mb of the sugarcane genome. A pipeline specifically tailored for polyploid sequence variants and genotype calling was established. BWA-mem and sorghum genome approved to be an acceptable aligner and reference for sugarcane target enrichment sequence analysis, respectively. Genetic variations including 1,166,066 non-redundant SNPs, 150,421 InDels, 919 gene copy number variations, and 1,257 gene presence/absence variations were detected. SNPs from three different callers (Samtools, Freebayes, and GATK) were compared and the validation rates were nearly 90%. Based on the SNP loci of each accession and their ploidy levels, 999,258 single dosage SNPs were identified and most loci were estimated as largely homozygotes. An average of 34,397 haplotype blocks for each accession was inferred. The highest divergence time among the Saccharum spp. was estimated as 1.2 million years ago (MYA). Saccharum spp. diverged from Erianthus and Sorghum approximately 5 and 6 MYA, respectively. The target enrichment sequencing approach provided an effective way to discover and catalog natural allelic variation in highly polyploid or heterozygous genomes. PMID:27375658

  17. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  18. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware. PMID:24524158

  19. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  20. Analysis of DNA Sequence Variants Detected by High Throughput Sequencing

    PubMed Central

    Adams, David R; Sincan, Murat; Fajardo, Karin Fuentes; Mullikin, James C; Pierson, Tyler M; Toro, Camilo; Boerkoel, Cornelius F; Tifft, Cynthia J; Gahl, William A; Markello, Tom C

    2014-01-01

    The Undiagnosed Diseases Program at the National Institutes of Health uses High Throughput Sequencing (HTS) to diagnose rare and novel diseases. HTS techniques generate large numbers of DNA sequence variants, which must be analyzed and filtered to find candidates for disease causation. Despite the publication of an increasing number of successful exome-based projects, there has been little formal discussion of the analytic steps applied to HTS variant lists. We present the results of our experience with over 30 families for whom HTS sequencing was used in an attempt to find clinical diagnoses. For each family, exome sequence was augmented with high-density SNP-array data. We present a discussion of the theory and practical application of each analytic step and provide example data to illustrate our approach. The paper is designed to provide an analytic roadmap for variant analysis, thereby enabling a wide range of researchers and clinical genetics practitioners to perform direct analysis of HTS data for their patients and projects. PMID:22290882

  1. Sequence variation in the ribosomal DNA internal transcribed spacer of Tridacna crocea.

    PubMed

    Yu, E T; Juinio-Meñez, M A; Monje, V D

    2000-11-01

    DNA-based genetic markers are needed to augment existing allozyme markers in the assessment of genetic diversity of wild giant clam populations. The dearth of polymorphic mitochondrial DNA regions amplified from known universal polymerase chain reaction (PCR) primers has led us to search other regions of the genome for viable sources of DNA polymorphism. We have designed tridacnid-specific PCR primers for the amplification of internal transcribed spacer regions. Sequences of the first internal transcribed spacer segment (ITS-1) revealed very high polymorphism, showing 29% variation arising from base substitutions alone. Preliminary restriction analysis of the ITS regions using 8 restriction enzymes revealed cryptic changes in the DNA sequence. These mutations are promising as marker tools for differentiating geographically separated populations. Such variation in the ITS region can possibly be used for population genetic analysis.

  2. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development.

  3. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions.

    PubMed

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M; Greenwood, Alex D; Roca, Alfred L

    2015-01-15

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  4. Sequence variation and haplotype structure at the human HFE locus.

    PubMed Central

    Toomajian, Christopher; Kreitman, Martin

    2002-01-01

    The HFE locus encodes an HLA class-I-type protein important in iron regulation and segregates replacement mutations that give rise to the most common form of genetic hemochromatosis. The high frequency of one disease-associated mutation, C282Y, and the nature of this disease have led some to suggest a selective advantage for this mutation. To investigate the context in which this mutation arose and gain a better understanding of HFE genetic variation, we surveyed nucleotide variability in 11.2 kb encompassing the HFE locus and experimentally determined haplotypes. We fully resequenced 60 chromosomes of African, Asian, or European ancestry as well as one chimpanzee, revealing 41 variable sites and a nucleotide diversity of 0.08%. This indicates that linkage to the HLA region has not substantially increased the level of HFE variation. Although several haplotypes are shared between populations, one haplotype predominates in Asia but is nearly absent elsewhere, causing higher than average genetic differentiation among the three major populations. Our samples show evidence of intragenic recombination, so the scarcity of recombination events within the C282Y allele class is consistent with selection increasing the frequency of a young allele. Otherwise, the pattern of variability in this region does not clearly indicate the action of positive selection at this or linked loci. PMID:12196404

  5. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    PubMed Central

    Chateigner, Aurélien; Bézier, Annie; Labrousse, Carole; Jiolle, Davy; Barbe, Valérie; Herniou, Elisabeth A.

    2015-01-01

    Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%). K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs). Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential. PMID:26198241

  6. A map of human genome variation from population-scale sequencing.

    PubMed

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  7. Population genetic structure of Indian shad, Tenualosa ilisha inferred from variation in mitochondrial DNA sequences.

    PubMed

    Behera, B K; Singh, N S; Paria, P; Sahoo, A K; Panda, D; Meena, D K; Das, P; Pakrashi, S; Biswas, D K; Sharma, A P

    2015-09-01

    Indian shad, Tenualosa ilisha, is a commercially important anadromous fish representing major catch in Indo-pacific region. The present study evaluated partial Cytochrome b (Cyt b) gene sequence of mtDNA in T. ilisha for determining genetic variation from Bay of Bengal and Arabian Sea origins. The genomic DNA extracted from T. ilisha samples representing two distant rivers in the Indian subcontinent, the Bhagirathi (lower stretch of Ganges) and the Tapi was analyzed. Sequencing of 307 bp mtDNA Cytochrome b gene fragment revealed the presence of 5 haplotypes, with high haplotype diversity (Hd) of 0.9048 with variance 0.103 and low nucleotide diversity (π) of 0.14301. Three population specific haplotypes were observed in river Ganga and two haplotypes in river Tapi. Neighbour-joining tree based on Cytochrome b gene sequences of T. ilisha showed that population from Bay of Bengal and Arabian Sea origins belonged to two distinct clusters.

  8. Population genetic structure of Indian shad, Tenualosa ilisha inferred from variation in mitochondrial DNA sequences.

    PubMed

    Behera, B K; Singh, N S; Paria, P; Sahoo, A K; Panda, D; Meena, D K; Das, P; Pakrashi, S; Biswas, D K; Sharma, A P

    2015-09-01

    Indian shad, Tenualosa ilisha, is a commercially important anadromous fish representing major catch in Indo-pacific region. The present study evaluated partial Cytochrome b (Cyt b) gene sequence of mtDNA in T. ilisha for determining genetic variation from Bay of Bengal and Arabian Sea origins. The genomic DNA extracted from T. ilisha samples representing two distant rivers in the Indian subcontinent, the Bhagirathi (lower stretch of Ganges) and the Tapi was analyzed. Sequencing of 307 bp mtDNA Cytochrome b gene fragment revealed the presence of 5 haplotypes, with high haplotype diversity (Hd) of 0.9048 with variance 0.103 and low nucleotide diversity (π) of 0.14301. Three population specific haplotypes were observed in river Ganga and two haplotypes in river Tapi. Neighbour-joining tree based on Cytochrome b gene sequences of T. ilisha showed that population from Bay of Bengal and Arabian Sea origins belonged to two distinct clusters. PMID:26521565

  9. Intrapatient sequence variation of the gag gene of human immunodeficiency virus type 1 plasma virions.

    PubMed Central

    Yoshimura, F K; Diem, K; Learn, G H; Riddell, S; Corey, L

    1996-01-01

    Because certain regions of the gag gene, such as p24, are highly conserved among human immunodeficiency virus (HIV) isolates, many therapeutic strategies have been directed at gag gene targets. Although intrapatient variation of segments of gag have been determined, little is known about the variability of the full-length gag gene for HIV isolated from a single individual. To evaluate intrapatient full-length gag variability, we derived the nucleotide sequences of at least 10 cDNA gag clones of virion RNA isolated from plasma for each of four asymptomatic HIV type 1-infected patients with relatively high CD4+ T-cell counts (300 to 450 cells per mm3). Mean values of intrapatient gag nucleotide variation obtained by pairwise comparisons ranged from 0.55 to 2.86%. For three subjects, this value was equivalent to that reported for intrapatient full-length env variation. The greatest range of intrapatient mean nucleotide variation for individual protein-coding regions was observed for p7. We did not detect any G-to-A hypermutation, as A-to-G and G-to-A transitions occurred at similar frequencies, accounting for 29 and 25%, respectively, of the changes. Mean variation values and phylogenetic analysis suggested that the extent of nucleotide variation correlated with the length of viral infection. Furthermore, no distinct subpopulations of quasispecies were detectable within an individual. The predicted amino acid sequences indicated that there were no regions within a gag protein that were comprised of clustered changes. PMID:8971017

  10. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing.

    PubMed

    Ferreira, Pedro G; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R; Rivas, Manuel A; Esteve-Codina, Anna; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  11. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    NASA Astrophysics Data System (ADS)

    2016-09-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  12. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    PubMed Central

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A.C.T; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  13. Spatio-temporal Variations of Characteristic Repeating Earthquake Sequences along the Middle America Trench in Mexico

    NASA Astrophysics Data System (ADS)

    Dominguez, L. A.; Taira, T.; Hjorleifsdottir, V.; Santoyo, M. A.

    2015-12-01

    Repeating earthquake sequences are sets of events that are thought to rupture the same area on the plate interface and thus provide nearly identical waveforms. We systematically analyzed seismic records from 2001 through 2014 to identify repeating earthquakes with highly correlated waveforms occurring along the subduction zone of the Cocos plate. Using the correlation coefficient (cc) and spectral coherency (coh) of the vertical components as selection criteria, we found a set of 214 sequences whose waveforms exceed cc≥95% and coh≥95%. Spatial clustering along the trench shows large variations in repeating earthquakes activity. Particularly, the rupture zone of the M8.1, 1985 earthquake shows an almost absence of characteristic repeating earthquakes, whereas the Guerrero Gap zone and the segment of the trench close to the Guerrero-Oaxaca border shows a significantly larger number of repeating earthquakes sequences. Furthermore, temporal variations associated to stress changes due to major shows episodes of unlocking and healing of the interface. Understanding the different components that control the location and recurrence time of characteristic repeating sequences is a key factor to pinpoint areas where large megathrust earthquakes may nucleate and consequently to improve the seismic hazard assessment.

  14. Viral detection by high-throughput sequencing.

    PubMed

    Motooka, Daisuke; Nakamura, Shota; Hagiwara, Katsuro; Nakaya, Takaaki

    2015-01-01

    We applied a high-throughput sequencing platform, Ion PGM, for viral detection in fecal samples from adult cows collected in Hokkaido, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.25 ml of fecal specimens (N = 8), and more than 5 μg of cDNA was synthesized. Unbiased high-throughput sequencing using the 318 v2 semiconductor chip of these eight samples yielded 57-580 K (average: 270 K, after data analysis) reads in a single run. As a result, viral genome sequences were detected in each specimen. In addition to bacteriophage, mammal- and insect-derived viruses, partial genome sequences of plant, algal, and protozoal viruses were detected. Thus, this metagenomic analysis of fecal specimens could be useful to comprehensively understand viral populations of the intestine and food sources in animals. PMID:25287501

  15. Viral detection by high-throughput sequencing.

    PubMed

    Motooka, Daisuke; Nakamura, Shota; Hagiwara, Katsuro; Nakaya, Takaaki

    2015-01-01

    We applied a high-throughput sequencing platform, Ion PGM, for viral detection in fecal samples from adult cows collected in Hokkaido, Japan. Random RT-PCR was performed to amplify RNA extracted from 0.25 ml of fecal specimens (N = 8), and more than 5 μg of cDNA was synthesized. Unbiased high-throughput sequencing using the 318 v2 semiconductor chip of these eight samples yielded 57-580 K (average: 270 K, after data analysis) reads in a single run. As a result, viral genome sequences were detected in each specimen. In addition to bacteriophage, mammal- and insect-derived viruses, partial genome sequences of plant, algal, and protozoal viruses were detected. Thus, this metagenomic analysis of fecal specimens could be useful to comprehensively understand viral populations of the intestine and food sources in animals.

  16. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  17. Glycoprotein Gene Sequence Variation in Rhesus Monkey Rhadinovirus

    PubMed Central

    Shin, Young C.; Jones, Leandro R.; Manrique, Julieta; Lauer, William; Carville, Angela; Mansfield, Keith G.; Desrosiers, Ronald C.

    2010-01-01

    Gene sequences for seven glycoproteins from 20 independent isolates of rhesus monkey rhadinovirus (RRV) and of the corresponding seven glycoprotein genes from nine strains of the Kaposi’s sarcoma-associated herpesvirus (KSHV) were obtained and analyzed. Phylogenetic analysis revealed two discrete groupings of RRV gH sequences, two discrete groupings of RRV gL sequences and two discrete groupings of RRV gB sequences. We called these phylogenetic groupings gHa, gHb, gLa, gLb, gBa and gBb. gHa was always paired with gLa and gHb was always paired with gLb for any individual RRV isolate. Since gH and gL are known to be interacting partners, these results suggest the need of matching sequence types for function of these cooperating proteins. gB phylogenetic grouping was not associated with gH/gL phylogenetic grouping. Our results demonstrate two distinct, distantly-related phylogenetic groupings of gH and gL of RRV despite a remarkable degree of sequence conservation within each individual phylogenetic group. PMID:20172576

  18. High-resolution mapping of protein sequence-function relationships.

    PubMed

    Fowler, Douglas M; Araya, Carlos L; Fleishman, Sarel J; Kellogg, Elizabeth H; Stephany, Jason J; Baker, David; Fields, Stanley

    2010-09-01

    We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.

  19. Cytochrome b nucleotide sequence variation among the Atlantic Alcidae.

    PubMed

    Friesen, V L; Montevecchi, W A; Davidson, W S

    1993-01-01

    Analysis of cytochrome b nucleotide sequences of the six extant species of Atlantic alcids and a gull revealed an excess of adenines and cytosines and a deficit of guanines at silent sites on the coding strand. Phylogenetic analyses grouped the sequences of the common (Uria aalge) and Brünnich's (U. lomvia) guillemots, followed by the razorbill (Alca torda) and little auk (Alle alle). The black guillemot (Cepphus grylle) sequence formed a sister taxon, and the puffin (Fratercula arctica) fell outside the other alcids. Phylogenetic comparisons of substitutions indicated that mutabilities of bases did not differ, but that C was much more likely to be incorporated than was G. Imbalances in base composition appear to result from a strand bias in replication errors, which may result from selection on secondary RNA structure and/or the energetics of codon-anticodon interactions. PMID:7916741

  20. Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth.

    PubMed

    Fromer, Menachem; Moran, Jennifer L; Chambert, Kimberly; Banks, Eric; Bergen, Sarah E; Ruderfer, Douglas M; Handsaker, Robert E; McCarroll, Steven A; O'Donovan, Michael C; Owen, Michael J; Kirov, George; Sullivan, Patrick F; Hultman, Christina M; Sklar, Pamela; Purcell, Shaun M

    2012-10-01

    Sequencing of gene-coding regions (the exome) is increasingly used for studying human disease, for which copy-number variants (CNVs) are a critical genetic component. However, detecting copy number from exome sequencing is challenging because of the noncontiguous nature of the captured exons. This is compounded by the complex relationship between read depth and copy number; this results from biases in targeted genomic hybridization, sequence factors such as GC content, and batching of samples during collection and sequencing. We present a statistical tool (exome hidden Markov model [XHMM]) that uses principal-component analysis (PCA) to normalize exome read depth and a hidden Markov model (HMM) to discover exon-resolution CNV and genotype variation across samples. We evaluate performance on 90 schizophrenia trios and 1,017 case-control samples. XHMM detects a median of two rare (<1%) CNVs per individual (one deletion and one duplication) and has 79% sensitivity to similarly rare CNVs overlapping three or more exons discovered with microarrays. With sensitivity similar to state-of-the-art methods, XHMM achieves higher specificity by assigning quality metrics to the CNV calls to filter out bad ones, as well as to statistically genotype the discovered CNV in all individuals, yielding a trio call set with Mendelian-inheritance properties highly consistent with expectation. We also show that XHMM breakpoint quality scores enable researchers to explicitly search for novel classes of structural variation. For example, we apply XHMM to extract those CNVs that are highly likely to disrupt (delete or duplicate) only a portion of a gene. PMID:23040492

  1. Associations between DNA Sequence Variation and Variation in Expression of the Adh Gene in Natural Populations of Drosophila Melanogaster

    PubMed Central

    Laurie, C. C.; Bridgham, J. T.; Choudhary, M.

    1991-01-01

    A large part of the genetic variation in alcohol dehydrogenase (ADH) activity level in natural populations of Drosophila melanogaster is associated with segregation of an amino acid replacement polymorphism at nucleotide 1490, which generates a difference in electrophoretic mobility. Part of the allozymic difference in activity level is due to a catalytic efficiency difference, which is also caused by the amino acid replacement, and part is due to a difference in the concentration of ADH protein. A previous site-directed in vitro mutagenesis experiment clearly demonstrated that the amino acid replacement has no effect on the concentration of ADH protein, nor does a strongly associated silent polymorphism at nucleotide 1443. Here we analyze associations between polymorphisms within the Adh gene and variation in ADH protein level for a number of chromosomes derived from natural populations. A sequence length polymorphism within the first intron of the distal (adult) transcript, &1, is in strong linkage disequilibrium with the amino acid replacement. Among a sample of 46 isochromosomal lines analyzed, all but one of the 14 Fast lines have &1 and all but one of the 32 Slow lines lack &1. The exceptional Fast line has an unusually low level of ADH protein (typical of Slow lines) and the exceptional Slow line has an unusually high level (typical of Fast lines). These results suggest that the &1 polymorphism may be responsible for the average difference in ADH protein between the allozymic classes. A previous experiment localized the effect on ADH protein to a 2.3-kb restriction fragment. DNA sequences of this fragment from several alleles of each allozymic type indicate that no other polymorphisms within this region are as closely associated with the ADH protein level difference as the &1 polymorphism. PMID:1683848

  2. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    PubMed

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-01

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping. PMID:21246044

  3. [Mitochondrial DNA sequence variations of Keriyan in the Taklamakan desert].

    PubMed

    Duan, Ran-Hui; Cui, Yin-Qiu; Zhou, Hui; Zhu, Hong

    2003-05-01

    The Keriyans live in the center of the Taklamakan desert of Xinjiang Province and they have never married with outsiders. Nobody knows clearly how they immigrated here and who was their origin. The mtDNA hypervariable segment I sequences were sequenced in 75 Keriyans. Seventy-one unique HVS I types were identified, varying at 68 nucleotide positions. Nucleotide diversity and the mean pairwise differences of Keriyan are intermediate between those reported for Eastern and Western populations. Keriyan's low Tajima's D statistics and bell-shaped pairwise-difference distributions can be interpreted as the hallmark of an ancient population expansion. Phylogenetic analysis shows Central Asian populations occupy a position intermediate between the Eastern and Western populations, moreover, the Keriyan presents shorter genetic distances to Xinjiang Uighur and Uighur in other places than to other populations.

  4. Sequence variation of alcohol dehydrogenase (Adh) paralogs in cactophilic Drosophila.

    PubMed Central

    Matzkin, Luciano M; Eanes, Walter F

    2003-01-01

    This study focuses on the population genetics of alcohol dehydrogenase (Adh) in cactophilic Drosophila. Drosophila mojavensis and D. arizonae utilize cactus hosts, and each host contains a characteristic mixture of alcohol compounds. In these Drosophila species there are two functional Adh loci, an adult form (Adh-2) and a larval and ovarian form (Adh-1). Overall, the greater level of variation segregating in D. arizonae than in D. mojavensis suggests a larger population size for D. arizonae. There are markedly different patterns of variation between the paralogs across both species. A 16-bp intron haplotype segregates in both species at Adh-2, apparently the product of an ancient gene conversion event between the paralogs, which suggests that there is selection for the maintenance of the intron structure possibly for the maintenance of pre-mRNA structure. We observe a pattern of variation consistent with adaptive protein evolution in the D. mojavensis lineage at Adh-1, suggesting that the cactus host shift that occurred in the divergence of D. mojavensis from D. arizonae had an effect on the evolution of the larval expressed paralog. Contrary to previous work we estimate a recent time for both the divergence of D. mojavensis and D. arizonae (2.4 +/- 0.7 MY) and the age of the gene duplication (3.95 +/- 0.45 MY). PMID:12586706

  5. High Resolution Magnetostratigraphy Susceptibility (MS) and Gamma Radiation (GR) Measurements from Three Coeval Upper Cretaceous Stratigraphic Sequences in Colorado: Testing MS and GR Variations Arising from Detrital Components in Variably Weathered Marine Sedimentary Rocks (Invited)

    NASA Astrophysics Data System (ADS)

    Ellwood, B. B.; Tomkin, J. H.; Wang, W.

    2010-12-01

    We have measured the magnetic susceptibility(MS) and gamma radiation (GR) for three Upper Cretaceous marine sedimentary sequences that span the Cenomanian-Turonian (C-T) boundary exposed as part of the Western Interior Seaway in Central Colorado. The purpose of this study was three fold: (1) to evaluate the combined potential of MS and GR as a correlation tool using well-studied sequences that have been previously correlated based on high-resolution lithostratigraphy, (2) to evaluate the effect of differential weathering on MS and GR values, and (3) to evaluate how their relationship with each other changes. This work includes sampling of the moderately weathered Global Boundary Stratotype Section and Point (GSSP) for the C-T boundary that is exposed in a railroad cut near Pueblo, CO. A nearby (~1 km) coeval section in an old road cut, where weathering is pronounced, was also sampled, as was fresh material through the C-T boundary interval from a core drilled ~40 km to the west of Pueblo (the USGS#1 Portland Core). MS was measured in the laboratory at LSU on samples collected at ~5 cm intervals from each of these sequences. GR was measured in the field at ~5 cm intervals on the two outcrop sequences, using a portable GR spectrometer. In addition, the GR also was measured on samples collected for MS measurement, using a laboratory-based Germanium detector. It is argued that both MS and GR data sets are controlled by detrital fluxes into the marine environment, although the effect of weathering, if any, on these parameters when exposed in outcrop, is not well documented. In addition, these parameters are controlled by different detrital components that may be derived from different sources, or be differentially concentrated within the marine system. Here we report the results of a number of experiments designed to evaluate how the MS and GR data sets co-vary, and to test their usefulness as correlation tools in stratigraphically. We have also examined the effects of

  6. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-01-01

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants' isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  7. HIV-1 sequence variation between isolates from mother-infant transmission pairs

    SciTech Connect

    Wike, C.M.; Daniels, M.R.; Furtado, M.; Wolinsky, M.; Korber, B.; Hutto, C.; Munoz, J.; Parks, W.; Saah, A.

    1991-12-31

    To examine the sequence diversity of human immunodeficiency virus type 1 (HIV-1) between known transmission sets, sequences from the V3 and V4-V5 region of the env gene from 4 mother-infant pairs were analyzed. The mean interpatient sequence variation between isolates from linked mother-infant pairs was comparable to the sequence diversity found between isolates from other close contacts. The mean intrapatient variation was significantly less in the infants` isolates then the isolates from both their mothers and other characterized intrapatient sequence sets. In addition, a distinct and characteristic difference in the glycosylation pattern preceding the V3 loop was found between each linked transmission pair. These findings indicate that selection of specific genotypic variants, which may play a role in some direct transmission sets, and the duration of infection are important factors in the degree of diversity seen between the sequence sets.

  8. The Autism Sequencing Consortium: Large scale, high throughput sequencing in autism spectrum disorders

    PubMed Central

    Buxbaum, J. D.; Daly, M. J.; Devlin, B.; Lehner, T.; Roeder, K.; State, M. W.

    2013-01-01

    Research during the past decade has seen significant progress toward a model for the genetic architecture of autism spectrum disorders (ASD), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time this research has highlighted ongoing challenges. Here we address the enormous impact of high throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multi-site collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. PMID:23259942

  9. Sequence variation at the major histocompatibility complex locus DQ beta in beluga whales (Delphinapterus leucas)

    PubMed

    Murray, B W; Malik, S; White, B N

    1995-07-01

    Genetic variation at the Major Histocompatibility Complex locus DQ beta was analyzed in 233 beluga whales (Delphinapterus leucas) from seven populations: St. Lawrence Estuary, eastern Beaufort Sea, eastern Chukchi Sea, western Hudson Bay, eastern Hudson Bay, southeastern Baffin Island, and High Arctic and in 12 narwhals (Monodon monoceros) sympatric with the High Arctic beluga population. Variation was assessed by amplification of the exon coding for the peptide binding region via the polymerase chain reaction, followed by either cloning and DNA sequencing or single-stranded conformation polymorphism analysis. Five alleles were found across the beluga populations and one in the narwhal. Pairwise comparisons of these alleles showed a 5:1 ratio of nonsynonymous to synonymous substitutions per site leading to eight amino acid differences, five of which were nonconservative substitutions, centered around positions previously shown to be important for peptide binding. Although the amount of allelic variation is low when compared with terrestrial mammals, the nature of the substitutions in the peptide binding sites indicates an important role for the DQ beta locus in the cellular immune response of beluga whales. Comparisons of allele frequencies among populations show the High Arctic population to be different (P < or = .005) from the other beluga populations surveyed. In these other populations an allele, Dele-DQ beta*0101-2, was found in 98% of the animals, while in the High Arctic it was found in only 52% of the animals. Two other alleles were found at high frequencies in the High Arctic population, one being very similar to the single allele found in narwhal. PMID:7659014

  10. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    PubMed Central

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-01-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications. PMID:27049631

  11. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  12. Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

    NASA Astrophysics Data System (ADS)

    Sheynkman, Gloria M.; Shortreed, Michael R.; Cesnik, Anthony J.; Smith, Lloyd M.

    2016-06-01

    Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

  13. Interspecific variations in adhesive protein sequences of Mytilus edulis, M. galloprovincialis, and M. trossulus.

    PubMed

    Inoue, K; Waite, J H; Matsuoka, M; Odo, S; Harayama, S

    1995-12-01

    Variation in the adhesive protein gene sequences of Mytilus edulis, M. galloprovincialis, and M. trossulus collected in Delaware, Kamaishi (Japan), and Alaska, respectively, was analyzed by the polymerase chain reaction (PCR) using two sets of oligonucleotide primers. The first set, Me 13 and Me 14, was designed to amplify the repetitive region. The length of the amplified fragments was highly variable, even among samples of the same species. Another set, Me 15 and Me 16, was designed to amplify a part of the nonrepetitive region. The length of the amplified fragments was uniform in each species and differed interspecifically; 180, 168, and 126 bp for M. edulis, M. trossulus, and M. galloprovincialis, respectively. The amplified sequence of M. trossulus resembled that of M. edulis. Mussels from other sites were also examined by PCR using Me 15 and Me 16. Wild mussels from Tromsö (Norway) and cultured mussels from Brittany (France) were identified as M. edulis. Cultured mussels from the Mediterranean coast of France and wild mussels from Shimizu (Japan) were identified as M. galloprovincialis. Some wild mussels from Hiura (Japan) were identified as a hybrid between M. galloprovincialis and M. trossulus. Thus, the length of this part (variable region) of the sequence is proposed as a diagnostic marker for these three morphologically similar species and their hybrids.

  14. Genome organization and variation in the 3'-partial sequence of garlic latent virus in China.

    PubMed

    Chen, Jiong; Zheng, Hongying; Chen, Jianping; Yang, Chongliang

    2002-08-01

    Ten different isolates of a carlavirus were detected by degenerate PCR from 12 garlic samples collected from 6 provinces in China, and the complete genome sequence of the Zhejiang isolate ZJ1 and 3'-terminal sequences of 9 other isolates were determined. The RNA genome of isolate ZJ1 consisted of 8363nts excluding the 3'-poly (A) tail, and the genome organization was similar to other carlaviruses with 6 open reading frames encoding a replicase, TGB1, TGB2, TGB3, CP and NABP respectively. Sequence comparisons showed that all 10 isolates were Garlic latent virus (GarLV). The variations in the TGB2, TGB3 and NABP were more significant than those in the CP. High homology was also detected between those isolates and Shallot latent virus (ShLV). Phylogenetic analysis suggested that GarLV isolates from garlic can be divided into 4 main groups and Chinese isolates belonged to each group. This is the first reported molecular analysis of members of the genus Carlavirus in China. PMID:18759032

  15. Mitochondrial control-region sequence variation in aboriginal Australians.

    PubMed Central

    van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B

    1998-01-01

    The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317

  16. Mitochondrial control-region sequence variation in aboriginal Australians.

    PubMed

    van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B

    1998-02-01

    The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317

  17. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ).

    PubMed

    An, Qing-Ming; Zhou, Hui-Tong; Hu, Jiang; Luo, Yu-Zhu; Hickford, Jon G H

    2015-01-01

    The adiponectin gene (ADIPOQ) plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5) of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A₁-D₁, A₂-D₂) were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A₃-C₃) and three SNPs were observed. Two patterns (A₄-B₄, A₅-B₅) and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A) putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg). In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A₁, A₂ and A₃ were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A₁-A₃, A₁-C₃, B₁-A₃ and B₁-C₃ were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits. PMID:26610572

  18. Targeted Exome Sequencing Outcome Variations of Colorectal Tumors within and across Two Sequencing Platforms

    PubMed Central

    Ashktorab, Hassan; Azimi, Hamed; Nickerson, Michael L.; Bass, Sara; Varma, Sudhir; Brim, Hassan

    2016-01-01

    Background and Aim Next generation sequencing (NGS) has quickly the tool of choice for genome and exome data generation. The multitude of sequencing platforms as well as the variabilities within each platform need to be assessed. In this paper we used two platforms (ION TORRENT AND ILLUMINA) to assess single nucleotides variants in colorectal cancer (CRC) specimens. Methods CRC specimens (n = 13) collected from 6 CRC (cancer and matched normal) patients were used to establish the mutational profile using ION TORRENT AND ILLUMINA sequencing platforms. We analyzed a set of samples from Formalin Fixed Paraffin Embedded and FF (FF) samples on both platforms to assess the effect of sample nature (FFPE vs. FF) on sequencing outcome and to evaluate the similarity/differences of SNVs across the two platforms. In addition, duplicates of FF samples were sequenced on each platform to assess variability within platform. Results The comparison of FF replicates to each other gave a concordance of 77% (± 15.3%) in Ion Torrent and 70% (± 3.7%) in Illumina. FFPE vs. FF replicates gave a concordance of 40% (± 32%) in Ion Torrent and 49% (± 19%) in Illumina. For the cross platform concordance were FFPE compared to FF (Average of 75% (± 9.8%) for FFPE samples and 67% (± 32%) for FF and 70% (± 26.8%) overall average). Conclusion Our data show a significant variability within and across platforms. Also the number of detected variants depend on the nature of the specimen; FF vs. FFPE. Validation of NGS discovered mutations is a must to rule-out false positive mutants. This validation might either be performed through a second NGS platform or through Sanger sequencing. PMID:27547838

  19. High compression image and image sequence coding

    NASA Technical Reports Server (NTRS)

    Kunt, Murat

    1989-01-01

    The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.

  20. Copy number variation of individual cattle genomes using next-generation sequencing

    PubMed Central

    Bickhart, Derek M.; Hou, Yali; Schroeder, Steven G.; Alkan, Can; Cardone, Maria Francesca; Matukumalli, Lakshmi K.; Song, Jiuzhou; Schnabel, Robert D.; Ventura, Mario; Taylor, Jeremy F.; Garcia, Jose Fernando; Van Tassell, Curtis P.; Sonstegard, Tad S.; Eichler, Evan E.; Liu, George E.

    2012-01-01

    Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ∼55.6-Mbp sequence—476 of which (∼38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (∼52%, χ2 test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome. PMID:22300768

  1. Extensive sequence variation in rice blast resistance gene Pi54 makes it broad spectrum in nature

    PubMed Central

    Thakur, Shallu; Singh, Pankaj K.; Das, Alok; Rathour, R.; Variar, M.; Prashanthi, S. K.; Singh, A. K.; Singh, U. D.; Chand, Duni; Singh, N. K.; Sharma, Tilak R.

    2015-01-01

    Rice blast resistant gene, Pi54 cloned from rice line, Tetep, is effective against diverse isolates of Magnaporthe oryzae. In this study, we prospected the allelic variants of the dominant blast resistance gene from a set of 92 rice lines to determine the nucleotide diversity, pattern of its molecular evolution, phylogenetic relationships and evolutionary dynamics, and to develop allele specific markers. High quality sequences were generated for homologs of Pi54 gene. Using comparative sequence analysis, InDels of variable sizes in all the alleles were observed. Profiling of the selected sites of SNP (Single Nucleotide Polymorphism) and amino acids (N sites ≥ 10) exhibited constant frequency distribution of mutational and substitutional sites between the resistance and susceptible rice lines, respectively. A total of 50 new haplotypes based on the nucleotide polymorphism was also identified. A unique haplotype (H_3) was found to be linked to all the resistant alleles isolated from indica rice lines. Unique leucine zipper and tyrosine sulfation sites were identified in the predicted Pi54 proteins. Selection signals were observed in entire coding sequence of resistance alleles, as compared to LRR domains for susceptible alleles. This is a maiden report of extensive variability of Pi54 alleles in different landraces and cultivated varieties, possibly, attributing broad-spectrum resistance to Magnaporthe oryzae. The sequence variation in two consensus region: 163 and 144 bp were used for the development of allele specific DNA markers. Validated markers can be used for the selection and identification of better allele(s) and their introgression in commercial rice cultivars employing marker assisted selection. PMID:26052332

  2. Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing

    PubMed Central

    Zichner, Thomas; Garfield, David A.; Rausch, Tobias; Stütz, Adrian M.; Cannavó, Enrico; Braun, Martina; Furlong, Eileen E.M.; Korbel, Jan O.

    2013-01-01

    Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the “Drosophila melanogaster Genetic Reference Panel,” DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism. PMID:23222910

  3. Tough Coating Proteins: Subtle Sequence Variation Modulates Cohesion

    PubMed Central

    Das, Saurabh; Miller, Dusty R.; Kaufman, Yair; Martinez Rodriguez, Nadine R.; Pallaoro, Alessia; Harrington, Matthew J.; Gylys, Maryte; Israelachvili, Jacob N.; Waite, J. Herbert

    2015-01-01

    Mussel foot protein-1 (mfp-1) is an essential constituent of the protective cuticle covering all exposed portions of the byssus (plaque and the thread) that marine mussels use to attach to intertidal rocks. The reversible complexation of Fe3+ by the 3,4-dihydroxyphenylalanine (Dopa) side chains in mfp-1 in Mytilus californianus cuticle is responsible for its high extensibility (120%) as well as its stiffness (2 GPa) due to the formation of sacrificial bonds that help to dissipate energy and avoid accumulation of stresses in the material. We have investigated the interactions between Fe3+ and mfp-1 from two mussel species, M. californianus (Mc) and M. edulis (Me), using both surface sensitive and solution phase techniques. Our results show that although mfp-1 homologues from both species bind Fe3+, mfp-1 (Mc) contains Dopa with two distinct Fe3+-binding tendencies and prefers to form intramolecular complexes with Fe3+. In contrast, mfp-1 (Me) is better adapted to intermolecular Fe3+ binding by Dopa. Addition of Fe3+ did not significantly increase the cohesion energy between the mfp-1 (Mc) films at pH 5.5. However, iron appears to stabilize the cohesive bridging of mfp-1 (Mc) films at the physiologically relevant pH of 7.5, where most other mfps lose their ability to adhere reversibly. Understanding the molecular mechanisms underpinning the capacity of M. californianus cuticle to withstand twice the strain of M. edulis cuticle is important for engineering of tunable strain tolerant composite coatings for biomedical applications. PMID:25692318

  4. Variation in the prion protein sequence in Dutch goat breeds.

    PubMed

    Windig, J J; Hoving, R A H; Priem, J; Bossers, A; van Keulen, L J M; Langeveld, J P M

    2016-10-01

    Scrapie is a neurodegenerative disease occurring in goats and sheep. Several haplotypes of the prion protein increase resistance to scrapie infection and may be used in selective breeding to help eradicate scrapie. In this study, frequencies of the allelic variants of the PrP gene are determined for six goat breeds in the Netherlands. Overall frequencies in Dutch goats were determined from 768 brain tissue samples in 2005, 766 in 2008 and 300 in 2012, derived from random sampling for the national scrapie surveillance without knowledge of the breed. Breed specific frequencies were determined in the winter 2013/2014 by sampling 300 breeding animals from the main breeders of the different breeds. Detailed analysis of the scrapie-resistant K222 haplotype was carried out in 2014 for 220 Dutch Toggenburger goats and in 2015 for 942 goats from the Saanen derived White Goat breed. Nine haplotypes were identified in the Dutch breeds. Frequencies for non-wild type haplotypes were generally low. Exception was the K222 haplotype in the Dutch Toggenburger (29%) and the S146 haplotype in the Nubian and Boer breeds (respectively 7 and 31%). The frequency of the K222 haplotype in the Toggenburger was higher than for any other breed reported in literature, while for the White Goat breed it was with 3.1% similar to frequencies of other Saanen or Saanen derived breeds. Further evidence was found for the existence of two M142 haplotypes, M142 /S240 and M142 /P240 . Breeds vary in haplotype frequencies but frequencies of resistant genotypes are generally low and consequently selective breeding for scrapie resistance can only be slow but will benefit from animals identified in this study. The unexpectedly high frequency of the K222 haplotype in the Dutch Toggenburger underlines the need for conservation of rare breeds in order to conserve genetic diversity rare or absent in other breeds. PMID:26991480

  5. Detecting Alu insertions from high-throughput sequencing data

    PubMed Central

    David, Matei; Mustafa, Harun; Brudno, Michael

    2013-01-01

    High-throughput sequencing technologies have allowed for the cataloguing of variation in personal human genomes. In this manuscript, we present alu-detect, a tool that combines read-pair and split-read information to detect novel Alus and their precise breakpoints directly from either whole-genome or whole-exome sequencing data while also identifying insertions directly in the vicinity of existing Alus. To set the parameters of our method, we use simulation of a faux reference, which allows us to compute the precision and recall of various parameter settings using real sequencing data. Applying our method to 100 bp paired Illumina data from seven individuals, including two trios, we detected on average 1519 novel Alus per sample. Based on the faux-reference simulation, we estimate that our method has 97% precision and 85% recall. We identify 808 novel Alus not previously described in other studies. We also demonstrate the use of alu-detect to study the local sequence and global location preferences for novel Alu insertions. PMID:23921633

  6. Otopalatodigital syndrome type 2 in a male infant: A case report with a novel sequence variation

    PubMed Central

    Sankararaman, Senthilkumar; Kurepa, Dalibor; Shen, Yiping; Kakkilaya, Venkatakrishna; Ursin, Sussone; Chen, Harold

    2013-01-01

    We report a male infant with typical clinical, pathological and radiological features of otopalatodigital syndrome type 2 (OPD 2) with a novel sequence variation in the FLNA gene. His clinical manifestations include typical craniofacial features, cleft palate, hearing impairment, omphalocele, bowing of the long bones, absent fibulae and digital abnormalities consistent with OPD 2. Two hemizygous sequence variations in the FLNA gene were identified. The variation c.5290G>A/p.Ala1764Thr has been previously reported in a patient with periventricular nodular heterotopia, but subsequently it has been reported as a polymorphism. The other variation c.613T>C/p.Cys205Arg detected in the proband has not been previously reported and our analysis indicates that this is a novel disease-causing mutation for OPD2. PMID:27625837

  7. Otopalatodigital syndrome type 2 in a male infant: A case report with a novel sequence variation.

    PubMed

    Sankararaman, Senthilkumar; Kurepa, Dalibor; Shen, Yiping; Kakkilaya, Venkatakrishna; Ursin, Sussone; Chen, Harold

    2013-03-01

    We report a male infant with typical clinical, pathological and radiological features of otopalatodigital syndrome type 2 (OPD 2) with a novel sequence variation in the FLNA gene. His clinical manifestations include typical craniofacial features, cleft palate, hearing impairment, omphalocele, bowing of the long bones, absent fibulae and digital abnormalities consistent with OPD 2. Two hemizygous sequence variations in the FLNA gene were identified. The variation c.5290G>A/p.Ala1764Thr has been previously reported in a patient with periventricular nodular heterotopia, but subsequently it has been reported as a polymorphism. The other variation c.613T>C/p.Cys205Arg detected in the proband has not been previously reported and our analysis indicates that this is a novel disease-causing mutation for OPD2. PMID:27625837

  8. High-throughput variation detection and genotyping using microarrays.

    PubMed

    Cutler, D J; Zwick, M E; Carrasquillo, M M; Yohn, C T; Tobin, K P; Kashuk, C; Mathews, D J; Shah, N A; Eichler, E E; Warrington, J A; Chakravarti, A

    2001-11-01

    The genetic dissection of complex traits may ultimately require a large number of SNPs to be genotyped in multiple individuals who exhibit phenotypic variation in a trait of interest. Microarray technology can enable rapid genotyping of variation specific to study samples. To facilitate their use, we have developed an automated statistical method (ABACUS) to analyze microarray hybridization data and applied this method to Affymetrix Variation Detection Arrays (VDAs). ABACUS provides a quality score to individual genotypes, allowing investigators to focus their attention on sites that give accurate information. We have applied ABACUS to an experiment encompassing 32 autosomal and eight X-linked genomic regions, each consisting of approximately 50 kb of unique sequence spanning a 100-kb region, in 40 humans. At sufficiently high-quality scores, we are able to read approximately 80% of all sites. To assess the accuracy of SNP detection, 108 of 108 SNPs have been experimentally confirmed; an additional 371 SNPs have been confirmed electronically. To access the accuracy of diploid genotypes at segregating autosomal sites, we confirmed 1515 of 1515 homozygous calls, and 420 of 423 (99.29%) heterozygotes. In replicate experiments, consisting of independent amplification of identical samples followed by hybridization to distinct microarrays of the same design, genotyping is highly repeatable. In an autosomal replicate experiment, 813,295 of 813,295 genotypes are called identically (including 351 heterozygotes); at an X-linked locus in males (haploid), 841,236 of 841,236 sites are called identically.

  9. Sequence variation of ribosomal internal transcribed spacers (ITS) in commercially important Phytoseiidae mites.

    PubMed

    Navajas, M; Lagnel, J; Fauvel, G; de Moraes, G

    1999-11-01

    Preliminary work is needed to assess the usefulness of different markers at different taxonomic scales when a new group is analyzed, such as the commercially important Phytoseiidae mites. We investigate here the level of sequence variation of the nuclear ribosomal spacers ITS 1 and 2 and the 5.8S gene in six species of Phytoseiidae: Neoseiulus culifornicus, N. fallacis, Euseius concordis, Metaseiulus occidentalis, Typhlodromus pyri and Phytoseiulus persimilis. As expected, the 5.8S gene (148 base pairs) is markedly conserved and displays little variation in between genera comparisons. ITS1 and ITS2 show contrasting patterns: while the ITS2 is short (80-89 bp) and shows little variation, the ITS1 is longer (303-404 bp) and is very variable in sequence. This fact compromises reliable nucleotide homologies when comparing the genera. The comparison of ITS1 sequence similarity at the species level might be useful for species identification, however, the value of ITS in taxonomic studies does not extend to the level of the family. The intraspecific variations of ITS were investigated in three species: N. californicus, N. fallacis and E. concordis. The first species has identical ITS1 sequences and the last two display low polymorphism (2 nucleotide substitutions). The ITS2 and 5.8S sequences were identical in all three subspecies comparisons.

  10. Sequence variation of ribosomal internal transcribed spacers (ITS) in commercially important Phytoseiidae mites.

    PubMed

    Navajas, M; Lagnel, J; Fauvel, G; de Moraes, G

    1999-11-01

    Preliminary work is needed to assess the usefulness of different markers at different taxonomic scales when a new group is analyzed, such as the commercially important Phytoseiidae mites. We investigate here the level of sequence variation of the nuclear ribosomal spacers ITS 1 and 2 and the 5.8S gene in six species of Phytoseiidae: Neoseiulus culifornicus, N. fallacis, Euseius concordis, Metaseiulus occidentalis, Typhlodromus pyri and Phytoseiulus persimilis. As expected, the 5.8S gene (148 base pairs) is markedly conserved and displays little variation in between genera comparisons. ITS1 and ITS2 show contrasting patterns: while the ITS2 is short (80-89 bp) and shows little variation, the ITS1 is longer (303-404 bp) and is very variable in sequence. This fact compromises reliable nucleotide homologies when comparing the genera. The comparison of ITS1 sequence similarity at the species level might be useful for species identification, however, the value of ITS in taxonomic studies does not extend to the level of the family. The intraspecific variations of ITS were investigated in three species: N. californicus, N. fallacis and E. concordis. The first species has identical ITS1 sequences and the last two display low polymorphism (2 nucleotide substitutions). The ITS2 and 5.8S sequences were identical in all three subspecies comparisons. PMID:10668860

  11. Nucleotide sequence variation of GLABRA1 contributing to phenotypic variation of leaf hairiness in Brassicaceae vegetables.

    PubMed

    Li, Feng; Zou, Zhongwei; Yong, Hui-Yee; Kitashiba, Hiroyasu; Nishio, Takeshi

    2013-05-01

    GLABRA1 (GL1) belongs to the group of R2R3-MYB transcription factors and is known to be essential for trichome initiation in Arabidopsis. In our previous study, we identified a GL1 ortholog in Brassica rapa as a candidate for the gene controlling leaf hairiness by QTL analysis and suggested that a 5-bp deletion (B-allele) and a 2-bp deletion (D-allele) in the exon 3 of BrGL1 and a non-synonymous SNP (C-allele) in the second nucleotide of exon 3 possibly cause leaf hairlessness. In this study, we transformed a B. rapa line having the B-allele with the A-allele (wild type) or the C-allele of BrGL1 under the control of the CaMV 35S promoter. The transgenic plants with the A-allele showed dense coverage of seedling tissues including stems, young leaves and hypocotyls with trichomes, whereas the phenotypes of those with the C-allele were unchanged. In order to obtain more information about allelic variation of GL1 in different plant lineages and its correlation with leaf hairiness, two GL1 homologs, i.e., RsGL1a and RsGL1b, in Raphanus sativus were analyzed. Allelic variation of RsGL1a between a hairless line and a hairy line was completely associated with hairiness in their BC1F1 population. Comparison of the full-length of RsGL1a in the hairless and hairy lines showed great variation of nucleotides in the 3' end, which might be essential for its function and expression.

  12. Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

    PubMed

    Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

    2014-06-01

    It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to

  13. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed.

    PubMed

    Kenig, F; Damsté, J S; Frewin, N L; Hayes, J M; De Leeuw, J W

    1995-06-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  14. Molecular indicators for palaeoenvironmental change in a Messinian evaporitic sequence (Vena del Gesso, Italy). II: High-resolution variations in abundances and 13C contents of free and sulphur-bound carbon skeletons in a single marl bed

    NASA Technical Reports Server (NTRS)

    Kenig, F.; Damste, J. S.; Frewin, N. L.; Hayes, J. M.; De Leeuw, J. W.

    1995-01-01

    The extractable organic matter of 10 immature samples from a marl bed of one evaporitic cycle of the Vena del Gesso sediments (Gessoso-solfifera Fm., Messinian, Italy) was analyzed quantitatively for free hydrocarbons and organic sulphur compounds. Nickel boride was used as a desulphurizing agent to recover sulphur-bound lipids from the polar and asphaltene fractions. Carbon isotopic compositions (delta vs PDB) of free hydrocarbons and of S-bound hydrocarbons were also measured. Relationships between these carbon skeletons, precursor biolipids, and the organisms producing them could then be examined. Concentrations of S-bound lipids and free hydrocarbons and their delta values were plotted vs depth in the marl bed and the profiles were interpreted in terms of variations in source organisms, 13 C contents of the carbon source, and environmentally induced changes in isotopic fractionation. The overall range of delta values measured was 24.7%, from -11.6% for a component derived from green sulphur bacteria (Chlorobiaceae) to -36.3% for a lipid derived from purple sulphur bacteria (Chromatiaceae). Deconvolution of mixtures of components deriving from multiple sources (green and purple sulphur bacteria, coccolithophorids, microalgae and higher plants) was sometimes possible because both quantitative and isotopic data were available and because either the free or S-bound pool sometimes appeared to contain material from a single source. Several free n-alkanes and S-bound lipids appeared to be specific products of upper-water-column primary producers (i.e. algae and cyanobacteria). Others derived from anaerobic photoautotrophs and from heterotrophic protozoa (ciliates), which apparently fed partly on Chlorobiaceae. Four groups of n-alkanes produced by algae or cyanobacteria were also recognized based on systematic variations of abundance and isotopic composition with depth. For hydrocarbons probably derived from microalgae, isotopic variations are well correlated with

  15. Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy

    SciTech Connect

    Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng; Kurz,Thorsten; Dubchak, Inna; Frazer, Kelly A.; Ober, Carole

    2005-09-10

    Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs each inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.

  16. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.

  17. Sequence variation in the androgen receptor gene is not a common determinant of male sexual orientation

    SciTech Connect

    Macke, J.P.; Nathans, J.; King, V.L. ); Hu, N.; Hu, S.; Hamer, D.; Bailey, M. ); Brown, T. )

    1993-10-01

    To test the hypothesis that DNA sequence variation in the androgen receptor gene plays a causal role in the development of male sexual orientation, the authors have (1) measured the degree of concordance of androgen receptor alleles in 36 pairs of homosexual brothers, (2) compared the lengths of polyglutamine and polyglycine tracts in the amino-terminal domain of the androgen receptor in a sample of 197 homosexual males and 213 unselected subjects, and (3) screened the entire androgen receptor coding region for sequence variation by PCR and denaturing gradient-gel electrophoresis (DGGE) and/or single-strand conformation polymorphism analysis in 20 homosexual males with homosexual or bisexual brothers and one homosexual male with no homosexual brothers, and screened the amino-terminal domain of the receptor for sequence variation in an additional 44 homosexual males, 37 of whom had one or more first- or second-degree male relatives who were either homosexual or bisexual. These analyses show that (1) homosexual brothers are as likely to be discordant as concordant for androgen receptor alleles; (2) there are no large-scale differences between the distributions of polyglycine or polyglutamine tract lengths in the homosexual and control groups; and (3) coding region sequence variation is not commonly found within the androgen receptor gene of homosexual men. The DGGE screen identified two rare amino acid substitutions, ser[sup 205] -to-arg and glu[sup 793]-to-asp, the biological significance of which is unknown. 32 refs., 2 figs., 2 tabs.

  18. Whole-genome sequencing reveals the diversity of cattle copy number variations and multicopy genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Structural and functional impacts of copy number variations (CNVs) on livestock genomes are not yet well understood. We identified 1853 CNV regions using population-scale sequencing data generated from 75 cattle representing 8 breeds (Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, Romagnol...

  19. Mitochondrial COI sequences in mites: evidence for variations in base composition.

    PubMed

    Navajas, M; Fournier, D; Lagnel, J; Gutierrez, J; Boursot, P

    1996-11-01

    Studies of mitochondrial DNA sequences in a variety of animals have shown important differences between phyla, including differences in the genetic codes used, and varying constraints on base composition. In that respect, little is known of mites, an important and diversified group. We sequenced a portion (340 nt) of the cytochrome oxidase subunit I (COI) encoding gene in twenty species of phytophagous mites belonging to nine genera of the two families Tetranychidae and Tenuipalpidae. The mitochondrial genetic code used in mites appeared to be the same as in insects. As is generally also the case in insects, the mite sequences were very rich in A + T (75% on average), especially at the third codon position (94%). However, important variations of base composition were observed among mite species, one of them showing as little as 69% A + T. Variations of base composition occur mostly through synonymous transitions, and do not have detectable effects on polypeptide evolution in this group. PMID:8933179

  20. Phylogenetic and functional analysis of sequence variation of human papillomavirus type 31 E6 and E7 oncoproteins.

    PubMed

    Ferenczi, Annamária; Gyöngyösi, Eszter; Szalmás, Anita; László, Brigitta; Kónya, József; Veress, György

    2016-09-01

    High-risk human papillomaviruses (HPV) are the causative agents of cervical and other anogenital cancers as well as a subset of head and neck cancers. The E6 and E7 oncoproteins of HPV contribute to oncogenesis by associating with the tumour suppressor protein p53 and pRb, respectively. For HPV types 16 and 18, intratypic sequence variation was shown to have biological and clinical significance. The functional significance of sequence variation among HPV 31 variants was studied less intensively. HPV 31 variants belonging to different variant lineages were found to have differences in persistence and in the ability to cause high grade cervical intraepithelial neoplasia. In the present study, we started to explore the functional effects of natural sequence variation of HPV 31 E6 and E7 oncoproteins. The E6 variants were tested for their effects on p53 protein stability and transcriptional activity, while the E7 variants were tested for their effects on pRb protein level and also on the transcriptional activity of E2F transcription factors. HPV 31 E7 variants displayed uniform effects on pRb stability and also on the activity of E2F transcription factors. HPV 31 E6 variants had remarkable differences in the ability to inhibit the trans-activation function of p53 but not in the ability to induce the in vivo degradation of p53. Our results indicate that natural sequence variation of the HPV 31 E6 protein may be involved in the observed differences in the oncogenic potential between HPV 31 variants. PMID:27197052

  1. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  2. AFLP and DNA sequence variation in an Andean domesticate, pepino (Solanum muricatum, Solanaceae): implications for evolution and domestication.

    PubMed

    Blanca, José M; Prohens, Jaime; Anderson, Gregory J; Zuriaga, Elena; Cañizares, Joaquín; Nuez, Fernando

    2007-07-01

    The pepino (Solanum muricatum) is a vegetatively propagated, domesticated native of the Andes, where it grows with wild relatives. We used AFLPs and a 1-kb sequence of the 3-methylcrotonyl-CoA carboxylase gene to study variation of 27 accessions of S. muricatum and 35 collections of 10 species of wild relatives (Solanum section Basarthrum). A total of 298 AFLP fragments and 29 DNA sequence haplotypes were detected. Cluster and principal coordinate analyses and other genetic parameters estimated from both types of markers, show that S. muricatum is closely related to the species from one of the series (Caripensia) of section Basarthrum and that >90% of the variation of the cultigen is also represented in that series. Pepino is highly diverse, either because it is not monophyletic or it has been subjected to regular introgression with wild species, or both. Although a continuous distribution of the genetic variation occurred within the cultivated species, three genetic clusters were recognized. Cluster 1 is mostly centered in Ecuador, cluster 2 in Ecuador and Peru, and cluster 3 in Colombia and Ecuador. Cluster 3 also includes all modern cultivars studied. These results and other evidence suggest that northern Ecuador/southern Colombia is the main center of pepino diversity and the center of origin. The high genetic variation of this cultigen indicates that domestication does not always produce a genetic bottleneck.

  3. Sequence variations of the locus-specific 5' untranslated regions of SLA class I genes and the development of a comprehensive genomic DNA-based high-resolution typing method for SLA-2.

    PubMed

    Choi, H; Le, M T; Lee, H; Choi, M-K; Cho, H-S; Nagasundarapandian, S; Kwon, O-J; Kim, J-H; Seo, K; Park, J-K; Lee, J-H; Ho, C-S; Park, C

    2015-10-01

    The genetic diversity of the major histocompatibility complex (MHC) class I molecules of pigs has not been well characterized. Therefore, the influence of MHC genetic diversity on the immune-related traits of pigs, including disease resistance and other MHC-dependent traits, is not well understood. Here, we attempted to develop an efficient method for systemic analysis of the polymorphisms in the epitope-binding region of swine leukocyte antigens (SLA) class I genes. We performed a comparative analysis of the last 92 bp of the 5' untranslated region (UTR) to the beginning of exon 4 of six SLA classical class I-related genes, SLA-1, -2, -3, -4, -5, and -9, from 36 different sequences. Based on this information, we developed a genomic polymerase chain reaction (PCR) and direct sequencing-based comprehensive typing method for SLA-2. We successfully typed SLA-2 from 400 pigs and 8 cell lines, consisting of 9 different pig breeds, and identified 49 SLA-2 alleles, including 31 previously reported alleles and 18 new alleles. We observed differences in the composition of SLA-2 alleles among different breeds. Our method can be used to study other SLA class I loci and to deepen our knowledge of MHC class I genes in pigs.

  4. Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray).

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-02-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  5. Nucleotide sequence variation of chitin synthase genes among ectomycorrhizal fungi and its potential use in taxonomy.

    PubMed Central

    Mehmann, B; Brunner, I; Braus, G H

    1994-01-01

    DNA sequences of single-copy genes coding for chitin synthases (UDP-N-acetyl-D-glucosamine:chitin 4-beta-N-acetylglucosaminyltransferase; EC 2.4.1.16) were used to characterize ectomycorrhizal fungi. Degenerate primers deduced from short, completely conserved amino acid stretches flanking a region of about 200 amino acids of zymogenic chitin synthases allowed the amplification of DNA fragments of several members of this gene family. Different DNA band patterns were obtained from basidiomycetes because of variation in the number and length of amplified fragments. Cloning and sequencing of the most prominent DNA fragments revealed that these differences were due to various introns at conserved positions. The presence of introns in basidiomycetous fungi therefore has a potential use in identification of genera by analyzing PCR-generated DNA fragment patterns. Analyses of the nucleotide sequences of cloned fragments revealed variations in nucleotide sequences from 4 to 45%. By comparison of the deduced amino acid sequences, the majority of the DNA fragments were identified as members of genes for chitin synthase class II. The deduced amino acid sequences from species of the same genus differed only in one amino acid residue, whereas identity between the amino acid sequences of ascomycetous and basidiomycetous fungi within the same taxonomic class was found to be approximately 43 to 66%. Phylogenetic analysis of the amino acid sequence of class II chitin synthase-encoding gene fragments by using parsimony confirmed the current taxonomic groupings. In addition, our data revealed a fourth class of putative zymogenic chitin synthesis. Images PMID:7944356

  6. Worldwide DNA sequence variation in a 10-kilobase noncoding region on human chromosome 22.

    PubMed

    Zhao, Z; Jin, L; Fu, Y X; Ramsay, M; Jenkins, T; Leskinen, E; Pamilo, P; Trexler, M; Patthy, L; Jorde, L B; Ramos-Onsins, S; Yu, N; Li, W H

    2000-10-10

    Human DNA sequence variation data are useful for studying the origin, evolution, and demographic history of modern humans and the mechanisms of maintenance of genetic variability in human populations, and for detecting linkage association of disease. Here, we report worldwide variation data from a approximately 10-kilobase noncoding autosomal region. We identified 75 variant sites in 64 humans (128 sequences) and 463 variant sites among the human, chimpanzee, and orangutan sequences. Statistical tests suggested that the region is selectively neutral. The average nucleotide diversity (pi) across the region was 0.088% among all of the human sequences obtained, 0.085% among African sequences, and 0.082% among non-African sequences, supporting the view of a low nucleotide diversity ( approximately 0.1%) in humans. The comparable pi value in non-Africans to that in Africans indicates no severe bottleneck during the evolution of modern non-Africans; however, the possibility of a mild bottleneck cannot be excluded because non-Africans showed considerably fewer variants than Africans. The present and two previous large data sets all show a strong excess of low frequency variants in comparison to that expected from an equilibrium population, indicating a relatively recent population expansion. The mutation rate was estimated to be 1.15 x 10(-9) per nucleotide per year. Estimates of the long-term effective population size N(e) by various statistical methods were similar to those in other studies. The age of the most recent common ancestor was estimated to be approximately 1.29 million years ago among all of the sequences obtained and approximately 634,000 years ago among the non-African sequences, providing the first evidence from a noncoding autosomal region for ancient human histories, even among non-Africans.

  7. BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations.

    PubMed

    Wang, Junbai; Batmanov, Kirill

    2015-12-01

    Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein-DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein-DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972

  8. Genome-wide Mycobacterium tuberculosis variation (GMTV) database: a new tool for integrating sequence variations and epidemiology

    PubMed Central

    2014-01-01

    Background Tuberculosis (TB) poses a worldwide threat due to advancing multidrug-resistant strains and deadly co-infections with Human immunodeficiency virus. Today large amounts of Mycobacterium tuberculosis whole genome sequencing data are being assessed broadly and yet there exists no comprehensive online resource that connects M. tuberculosis genome variants with geographic origin, with drug resistance or with clinical outcome. Description Here we describe a broadly inclusive unifying Genome-wide Mycobacterium tuberculosis Variation (GMTV) database, (http://mtb.dobzhanskycenter.org) that catalogues genome variations of M. tuberculosis strains collected across Russia. GMTV contains a broad spectrum of data derived from different sources and related to M. tuberculosis molecular biology, epidemiology, TB clinical outcome, year and place of isolation, drug resistance profiles and displays the variants across the genome using a dedicated genome browser. GMTV database, which includes 1084 genomes and over 69,000 SNP or Indel variants, can be queried about M. tuberculosis genome variation and putative associations with drug resistance, geographical origin, and clinical stages and outcomes. Conclusions Implementation of GMTV tracks the pattern of changes of M. tuberculosis strains in different geographical areas, facilitates disease gene discoveries associated with drug resistance or different clinical sequelae, and automates comparative genomic analyses among M. tuberculosis strains. PMID:24767249

  9. High-Throughput Sequencing in Mitochondrial DNA Research

    PubMed Central

    Ye, Fei; Samuels, David C.; Clark, Travis; Guo, Yan

    2014-01-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers’ ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control. PMID:24859348

  10. High-throughput sequencing in mitochondrial DNA research.

    PubMed

    Ye, Fei; Samuels, David C; Clark, Travis; Guo, Yan

    2014-07-01

    Next-generation sequencing, also known as high-throughput sequencing, has greatly enhanced researchers' ability to conduct biomedical research on all levels. Mitochondrial research has also benefitted greatly from high-throughput sequencing; sequencing technology now allows for screening of all 16,569 base pairs of the mitochondrial genome simultaneously for SNPs and low level heteroplasmy and, in some cases, the estimation of mitochondrial DNA copy number. It is important to realize the full potential of high-throughput sequencing for the advancement of mitochondrial research. To this end, we review how high-throughput sequencing has impacted mitochondrial research in the categories of SNPs, low level heteroplasmy, copy number, and structural variants. We also discuss the different types of mitochondrial DNA sequencing and their pros and cons. Based on previous studies conducted by various groups, we provide strategies for processing mitochondrial DNA sequencing data, including assembly, variant calling, and quality control.

  11. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted. PMID:26846812

  12. Sequence variation within the KIV-2 copy number polymorphism of the human LPA gene in African, Asian, and European populations.

    PubMed

    Noureen, Asma; Fresser, Friedrich; Utermann, Gerd; Schmidt, Konrad

    2015-01-01

    Amazingly little sequence variation is reported for the kringle IV 2 copy number variation (KIV 2 CNV) in the human LPA gene. Apart from whole genome sequencing projects, this region has only been analyzed in some detail in samples of European populations. We have performed a systematic resequencing study of the exonic and flanking intron regions within the KIV 2 CNV in 90 alleles from Asian, European, and four different African populations. Alleles have been separated according to their CNV length by pulsed field gel electrophoresis prior to unbiased specific PCR amplification of the target regions. These amplicons covered all KIV 2 copies of an individual allele simultaneously. In addition, cloned amplicons from genomic DNA of an African individual were sequenced. Our data suggest that sequence variation in this genomic region may be higher than previously appreciated. Detection probability of variants appeared to depend on the KIV 2 copy number of the analyzed DNA and on the proportion of copies carrying the variant. Asians had a high frequency of so-called KIV 2 type B and type C (together 70% of alleles), which differ by three or two synonymous substitutions respectively from the reference type A. This is most likely explained by the strong bottleneck suggested to have occurred when modern humans migrated to East Asia. A higher frequency of variable sites was detected in the Africans. In particular, two previously unreported splice site variants were found. One was associated with non-detectable Lp(a). The other was observed at high population frequencies (10% to 40%). Like the KIV 2 type B and C variants, this latter variant was also found in a high proportion of KIV 2 repeats in the affected alleles and in alleles differing in copy numbers. Our findings may have implications for the interpretation of SNP analyses in other repetitive loci of the human genome.

  13. SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations

    PubMed Central

    Hart, Steven N.; Sarangi, Vivekananda; Moore, Raymond; Baheti, Saurabh; Bhavsar, Jaysheel D.; Couch, Fergus J.; Kocher, Jean-Pierre A.

    2013-01-01

    Background Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. Results We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. Conclusions We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance. PMID:24358278

  14. Sequence-Based Pronunciation Variation Modeling for Spontaneous ASR Using a Noisy Channel Approach

    NASA Astrophysics Data System (ADS)

    Hofmann, Hansjörg; Sakti, Sakriani; Hori, Chiori; Kashioka, Hideki; Nakamura, Satoshi; Minker, Wolfgang

    The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.

  15. Using evolutionary sequence variation to make inferences about protein structure and function

    NASA Astrophysics Data System (ADS)

    Colwell, Lucy

    2015-03-01

    The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. The explosive growth in the number of available protein sequences raises the possibility of using the natural variation present in homologous protein sequences to infer these constraints and thus identify residues that control different protein phenotypes. Because in many cases phenotypic changes are controlled by more than one amino acid, the mutations that separate one phenotype from another may not be independent, requiring us to understand the correlation structure of the data. To address this we build a maximum entropy probability model for the protein sequence. The parameters of the inferred model are constrained by the statistics of a large sequence alignment. Pairs of sequence positions with the strongest interactions accurately predict contacts in protein tertiary structure, enabling all atom structural models to be constructed. We describe development of a theoretical inference framework that enables the relationship between the amount of available input data and the reliability of structural predictions to be better understood.

  16. Sequence variation and differential splicing of the midgut cadherin gene in Trichoplusia ni.

    PubMed

    Zhang, Xin; Kain, Wendy; Wang, Ping

    2013-08-01

    The insect midgut cadherin serves as an important receptor for the Cry toxins from Bacillus thuringiensis (Bt). Variation of the cadherin in insect populations provides a genetic potential for development of cadherin-based Bt resistance in insect populations. Sequence analysis of the cadherin from the cabbage looper, Trichoplusia ni, together with cadherins from 18 other lepidopterans showed a similar phylogenetic relationship of the cadherins to the phylogeny of Lepidoptera. The midgut cadherin in three laboratory populations of T. ni exhibited high variability, although the resistance to Bt toxin Cry1Ac in the T. ni strain is not genetically associated with cadherin gene mutations. A total of 142 single nucleotide polymorphisms (SNPs) were identified in the cadherin cDNAs from the T. ni strains, including 20 missense mutations. In addition, insertion and deletion polymorphisms (indels) were also identified in the cadherin alleles in T. ni. More interestingly, the results from this study reveal that differential splicing of mRNA also occurs in the cadherin gene expression. Therefore, variation of the midgut cadherin in insects may not only be caused by cadherin gene mutations, but could also result from alternative splicing of its mRNA regulated by factors acting in trans. Analysis of cadherin gene alleles in F2, F3 and F4 progenies from the cross between the Cry1Ac resistant and the susceptible strain after consecutive selections with Cry1Ac for three generations showed that selection with Cry1Ac did not result in an increase of frequencies of the cadherin alleles originated from the resistant strain. PMID:23743444

  17. Low-frequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations.

    PubMed

    Zheng, Wenjun; Brooks, Bernard R; Thirumalai, D

    2006-05-16

    By representing the high-resolution crystal structures of a number of enzymes using the elastic network model, it has been shown that only a few low-frequency normal modes are needed to describe the large-scale domain movements that are triggered by ligand binding. Here we explore a link between the nearly invariant nature of the modes that describe functional dynamics at the mesoscopic level and the large evolutionary sequence variations at the residue level. By using a structural perturbation method (SPM), which probes the residue-specific response to perturbations (or mutations), we identify a sparse network of strongly conserved residues that transmit allosteric signals in three structurally unrelated biological nanomachines, namely, DNA polymerase, myosin motor, and the Escherichia coli chaperonin. Based on the response of every mode to perturbations, which are generated by interchanging specific sequence pairs in a multiple sequence alignment, we show that the functionally relevant low-frequency modes are most robust to sequence variations. Our work shows that robustness of dynamical modes at the mesoscopic level is encoded in the structure through a sparse network of residues that transmit allosteric signals.

  18. Genome-wide copy number variation in the bovine genome detected using low coverage sequence of popular beef breeds

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic structural variations are an important source of genetic diversity. Copy number variations (CNVs), gains and losses of large regions of genomic sequence between individuals of a species, are known to be associated with both diseases and phenotypic traits. Deeply sequenced genomes are often u...

  19. Analysis of sequence variations in several human genes using phosphoramidite bond DNA fragmentation and chip-based MALDI-TOF.

    PubMed

    Smylie, Kevin J; Cantor, Charles R; Denissenko, Mikhail F

    2004-01-01

    The challenge in the postgenome era is to measure sequence variations over large genomic regions in numerous patient samples. This massive amount of work can only be completed if more accurate, cost-effective, and high-throughput solutions become available. Here we describe a novel DNA fragmentation approach for single nucleotide polymorphism (SNP) discovery and sequence validation. The base-specific cleavage is achieved by creating primer extension products, in which acid-labile phosphoramidite (P-N) bonds replace the 5' phosphodiester bonds of newly incorporated pyrimidine nucleotides. Sequence variations are detected by hydrolysis of this acid-labile bond and MALDI-TOF analysis of the resulting fragments. In this study, we developed a robust protocol for P-N-bond fragmentation and investigated additional ways to improve its sensitivity and reproducibility. We also present the analysis of several human genomic targets ranging from 100-450 bp in length. By using a semiautomated sample processing protocol, we investigated an array of SNPs within a 240-bp segment of the NFKBIA gene in 48 human DNA samples. We identified and measured frequencies for the two common SNPs in the 3'UTR of NFKBIA (separated by 123 bp) and then confirmed these values in an independent genotyping experiment. The calculated allele frequencies in white and African American groups differed significantly, yet both fit Hardy-Weinberg expectations. This demonstrates the utility and effectiveness of PN-bond DNA fragmentation and subsequent MALDI-TOF MS analysis for the high-throughput discovery and measurement of sequence variations in fragments up to 0.5 kb in length in multiple human blood DNA samples.

  20. Sequence variation at the major histocompatibility complex DRB loci in beluga (Delphinapterus leucas) and narwhal (Monodon monoceros).

    PubMed

    Murray, B W; White, B N

    1998-09-01

    The variation at loci with similarity to DRB class II major histocompatibility complex loci was assessed in 313 beluga collected from 13 sampling locations across North America, and 11 narwhal collected in the Canadian high Arctic. Variation was assessed by amplification of exon 2, which codes for the peptide binding region, via the polymerase chain reaction, followed by either cloning and DNA sequencing or single-stranded conformation polymorphism analysis. Two DRB loci were identified in beluga: DRB1, a polymorphic locus, and, DRB2, a monomorphic locus. Eight alleles representing five distinct lineages (based on sequence similarity) were found at the beluga DRB1 locus. Although the relative number of alleles is low when compared with terrestrial mammals, the amino acid variation found among the lineages is moderate. At the DRB1 locus, the average number of nonsynonymous substitutions per site is greater than the average number of synonymous substitutions per site (0.0806 : 0.0207, respectively; P<0.01). Most of the 31 amino acid substitutions do not conserve the physiochemical properties of the residue, and 21 of these are located at positions implicated as forming pockets responsible for the selective binding of foreign peptide side chains. Only DRB1 variation was examined in 11 narwhal, revealing a low amount of variation. These data are consistent with an important role for the DRB1 locus in the cellular immune response of beluga. In addition, the ratio of nonsynonymous to synonymous substitutions is similar to that among primate alleles, arguing against a reduction in the balancing selection pressure in the marine environment. Two hypotheses may explain the modest amount of Mhc variation when compared with terrestrial mammals: small population sizes at speciation or a reduced neutral substitution rate in cetaceans. PMID:9716643

  1. Identification of genetic variations of a Chinese family with paramyotonia congenita via whole exome sequencing.

    PubMed

    Li, Jinxin; Huang, Qinghai; Ge, Liang; Xu, Jing; Shi, Xingjuan; Xie, Wei; Liu, Xiang; Liu, Xiangdong

    2015-06-01

    Paramyotonia congenita (PC) is a rare autosomal dominant neuromuscular disorder characterized by juvenile onset and development of cold-induced myotonia after repeated activities. The disease is mostly caused by genetic mutations of the sodium channel, voltage-gated, type IV, alpha subunit (SCN4A) gene. This study intended to systematically identify the causative genetic variations of a Chinese Han PC family. Seven members of this PC family, including four patients and three healthy controls, were selected for whole exome sequencing (WES) using the Illumina HiSeq platform. Sequence variations were identified using the SoftGenetics program. The mutation R1448C of SCN4A was found to be the only causative mutation. This study applied WES technology to sequence multiple members of a large PC family and was the first to systematically confirm that the genetic change in SCN4A is the only causative variation in this PC family and the SCN4A mutation is sufficient to lead to PC.

  2. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  3. Investigation of the population structure of Legionella pneumophila by analysis of tandem repeat copy number and internal sequence variation.

    PubMed

    Visca, Paolo; D'Arezzo, Silvia; Ramisse, Françoise; Gelfand, Yevgeniy; Benson, Gary; Vergnaud, Gilles; Fry, Norman K; Pourcel, Christine

    2011-09-01

    The population structure of the species Legionella pneumophila was investigated by multilocus variable number of tandem repeats (VNTR) analysis (MLVA) and sequencing of three VNTRs (Lpms01, Lpms04 and Lpms13) in selected strains. Of 150 isolates of diverse origins, 136 (86 %) were distributed into eight large MLVA clonal complexes (VACCs) and the rest were either unique or formed small clusters of up to two MLVA genotypes. In spite of the lower degree of genome-wide linkage disequilibrium of the MLVA loci compared with sequence-based typing, the clustering achieved by the two methods was highly congruent. The detailed analysis of VNTR Lpms04 alleles showed a very complex organization, with five different repeat unit lengths and a high level of internal variation. Within each MLVA-defined VACC, Lpms04 was endowed with a common recognizable pattern with some interesting exceptions. Evidence of recombination events was suggested by analysis of internal repeat variations at the two additional VNTR loci, Lpms01 and Lpms13. Sequence analysis of L. pneumophila VNTR locus Lpms04 alone provides a first-line assay for allocation of a new isolate within the L. pneumophila population structure and for epidemiological studies.

  4. mit-o-matic: a comprehensive computational pipeline for clinical evaluation of mitochondrial variations from next-generation sequencing datasets.

    PubMed

    Vellarikkal, Shamsudheen Karuthedath; Dhiman, Heena; Joshi, Kandarp; Hasija, Yasha; Sivasubbu, Sridhar; Scaria, Vinod

    2015-04-01

    The human mitochondrial genome has been reported to have a very high mutation rate as compared with the nuclear genome. A large number of mitochondrial mutations show significant phenotypic association and are involved in a broad spectrum of diseases. In recent years, there has been a remarkable progress in the understanding of mitochondrial genetics. The availability of next-generation sequencing (NGS) technologies have not only reduced sequencing cost by orders of magnitude but has also provided us good quality mitochondrial genome sequences with high coverage, thereby enabling decoding of a number of human mitochondrial diseases. In this study, we report a computational and experimental pipeline to decipher the human mitochondrial DNA variations and examine them for their clinical correlation. As a proof of principle, we also present a clinical study of a patient with Leigh disease and confirmed maternal inheritance of the causative allele. The pipeline is made available as a user-friendly online tool to annotate variants and find haplogroup, disease association, and heteroplasmic sites. The "mit-o-matic" computational pipeline represents a comprehensive cloud-based tool for clinical evaluation of mitochondrial genomic variations from NGS datasets. The tool is freely available at http://genome.igib.res.in/mitomatic/.

  5. Genetic variation in and spatial structure of natural populations of Dipterocarpus alatus (Dipterocarpaceae) determined using single sequence repeat markers.

    PubMed

    Tam, N M; Duy, V D; Duc, N M; Giap, V D; Xuan, B T T

    2014-01-01

    Dipterocarpus alatus (Dipterocarpaceae) is widely distributed in lowland forests in central and southern Vietnam, Cambodia, Laos, Myanmar, Philippines, Thailand, and India. Due to over-exploitation and habitat destruction, the species is now threatened. The genetic variation within and among populations of D. alatus was investigated on the basis of 9 microsatellite (single sequence repeat, SSR) loci. In all, 268 sampled trees from 10 populations in central and southern Vietnam were analyzed in this study. The SSR data showed a high genetic variability within populations with an average of HO = 0.209 and HE = 0.239. Genetic differentiation among populations was high (FST = 0.266), indicating limited gene flow (Nm = 0.69). Analysis of molecular variance showed that most genetic variation was within populations (74.96%). This study highlights the importance of conserving the genetic resources of D. alatus species.

  6. Genetic variation in and spatial structure of natural populations of Dipterocarpus alatus (Dipterocarpaceae) determined using single sequence repeat markers.

    PubMed

    Tam, N M; Duy, V D; Duc, N M; Giap, V D; Xuan, B T T

    2014-01-01

    Dipterocarpus alatus (Dipterocarpaceae) is widely distributed in lowland forests in central and southern Vietnam, Cambodia, Laos, Myanmar, Philippines, Thailand, and India. Due to over-exploitation and habitat destruction, the species is now threatened. The genetic variation within and among populations of D. alatus was investigated on the basis of 9 microsatellite (single sequence repeat, SSR) loci. In all, 268 sampled trees from 10 populations in central and southern Vietnam were analyzed in this study. The SSR data showed a high genetic variability within populations with an average of HO = 0.209 and HE = 0.239. Genetic differentiation among populations was high (FST = 0.266), indicating limited gene flow (Nm = 0.69). Analysis of molecular variance showed that most genetic variation was within populations (74.96%). This study highlights the importance of conserving the genetic resources of D. alatus species. PMID:25078594

  7. Population subdivision in Europe's great bustard inferred from mitochondrial and nuclear DNA sequence variation.

    PubMed

    Pitra, C; Lieckfeldt, D; Alonso, J C

    2000-08-01

    A continent-wide survey of sequence variation in mitochondrial (mt) and nuclear (n) DNA of the endangered great bustard (Otis tarda) was conducted to assess the extent of phylogeographic structure in a morphologically monotypic bird. DNA sequence variation in a combined 809 bp segment of the mtDNA genome from 66 individuals from the last six breeding regions showed relatively low levels of intraspecific sequence diversity (n = 0.32%) but significant differences in the regional distribution of 11 haplotypes (phiST = 0.49). Despite their exceptional potential for dispersal, a complete and long-term historical separation between the populations from the Iberian Peninsula (Spain) and mainland Europe (Hungary, Slovakia, Germany, and Russia) was demonstrated. Divergence between populations based on a 3-bp insertion-deletion polymorphism within the intron region of the nuclear CHD-Z gene was geographically concordant with the primary subdivision identified within the mtDNA sequences. Inferred aspects of phylogeography were used to formulate conservation recommendations for this endangered species.

  8. Magnetic susceptibility variations in Loess sequences and their relationship to astronomical forcing

    NASA Technical Reports Server (NTRS)

    Verosub, Kenneth L.; Singer, Michael J.

    1992-01-01

    The long, well-exposed and often continuous sequences of loess found throughout the world are generally thought to provide an excellent opportunity for studying long-term, large-scale environmental change during the last few million years. In recent years, the most fruitful loess studies have been those involving the deposits of the loess in China. One of the most intriguing results of that work has been the discovery of an apparent correlation between variations in the magnetic susceptibility of the loess sequence and the oxygen isotope record of the deep sea. This correlation implies that magnetic susceptibility variations are being driven by astronomical parameters. However, the basic data have been interpreted in various ways by different authors, most of whom assumed that the magnetic minerals in the loess have not been affected by post-depositional processes. Using a chemical extraction procedure that allows us to separate the contribution of secondary pedogenic magnetic minerals from primary inherited magnetic minerals, we have found that the magnetic susceptibility of the Chinese paleosols is largely due to a pedogenic component which is present to a lesser degree in the loess. We have also found that the smaller inherited component of the magnetic susceptibility is about the same in the paleosols and the loess. These results demonstrate the need for additional study of the processes that create magnetic susceptibility variations in order to interpret properly the role of astronomical forcing in producing these variations.

  9. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  10. High levels of variation in Salix lignocellulose genes revealed using poplar genomic resources

    PubMed Central

    2013-01-01

    Background Little is known about the levels of variation in lignin or other wood related genes in Salix, a genus that is being increasingly used for biomass and biofuel production. The lignin biosynthesis pathway is well characterized in a number of species, including the model tree Populus. We aimed to transfer the genomic resources already available in Populus to its sister genus Salix to assess levels of variation within genes involved in wood formation. Results Amplification trials for 27 gene regions were undertaken in 40 Salix taxa. Twelve of these regions were sequenced. Alignment searches of the resulting sequences against reference databases, combined with phylogenetic analyses, showed the close similarity of these Salix sequences to Populus, confirming homology of the primer regions and indicating a high level of conservation within the wood formation genes. However, all sequences were found to vary considerably among Salix species, mainly as SNPs with a smaller number of insertions-deletions. Between 25 and 176 SNPs per kbp per gene region (in predicted exons) were discovered within Salix. Conclusions The variation found is sizeable but not unexpected as it is based on interspecific and not intraspecific comparison; it is comparable to interspecific variation in Populus. The characterisation of genetic variation is a key process in pre-breeding and for the conservation and exploitation of genetic resources in Salix. This study characterises the variation in several lignocellulose gene markers for such purposes. PMID:23924375

  11. Simple sequence repeat variations expedite phage divergence: Mechanisms of indels and gene mutations.

    PubMed

    Lin, Tiao-Yin

    2016-07-01

    Phages are the most abundant biological entities and influence prokaryotic communities on Earth. Comparing closely related genomes sheds light on molecular events shaping phage evolution. Simple sequence repeat (SSR) variations impart over half of the genomic changes between T7M and T3, indicating an important role of SSRs in accelerating phage genetic divergence. Differences in coding and noncoding regions of phages infecting different hosts, coliphages T7M and T3, Yersinia phage ϕYeO3-12, and Salmonella phage ϕSG-JL2, frequently arise from SSR variations. Such variations modify noncoding and coding regions; the latter efficiently changes multiple amino acids, thereby hastening protein evolution. Four classes of events are found to drive SSR variations: insertion/deletion of SSR units, expansion/contraction of SSRs without alteration of genome length, changes of repeat motifs, and generation/loss of repeats. The categorization demonstrates the ways SSRs mutate in genomes during phage evolution. Indels are common constituents of genome variations and human diseases, yet, how they occur without preexisting repeat sequence is less understood. Non-repeat-unit-based misalignment-elongation (NRUBME) is proposed to be one mechanism for indels without adjacent repeats. NRUBME or consecutive NRUBME may also change repeat motifs or generate new repeats. NRUBME invoking a non-Watson-Crick base pair explains insertions that initiate mononucleotide repeats. Furthermore, NRUBME successfully interprets many inexplicable human di- to tetranucleotide repeat generations. This study provides the first evidence of SSR variations expediting phage divergence, and enables insights into the events and mechanisms of genome evolution. NRUBME allows us to emulate natural evolution to design indels for various applications.

  12. Distinct intraspecific variations of garlic (Allium sativum L.) revealed by the exon-intron sequences of the alliinase gene.

    PubMed

    Endo, Aki; Imai, Yukiko; Nakamura, Mizuho; Yanagisawa, Eri; Taguchi, Takaaki; Torii, Kosuke; Okumura, Hidenobu; Ichinose, Koji

    2014-04-01

    Garlic (Allium sativum L.) has been used worldwide as a food and for medicinal purposes since early times. Garlic cultivars exhibit considerable morphological diversity despite the fact that they are mostly sterile and are grown only by vegetative propagation of cloves. Considerable recombination occurs in garlic genomes, including the genes involved in secondary metabolites. We examined the genomic DNAs (gDNAs) from garlic, encoding alliinase, a key enzyme involved in organosulfur metabolism in Allium plants. The 1.7-kb gDNA fragments, covering three exons (2, 3, and 4) and all four introns, were amplified from total DNAs prepared from garlic samples produced in Asia and Europe, leading to 73 sequences in total: Japan (JPN), China (CHN), India (IND), Spain (ESP), and France (FRA). The exon sequences were highly conserved among all the sequences, probably reflecting the fully functional alliinase associated with the flavor quality. Distinct intraspecific variations were detected for all four intron sequences, leading to the haplotype classifications. A close relationship between JPN and CHN was observed for all four introns, whereas IND showed a more divergent distribution. ESP and FRA afforded clearly different variants compared with those from Asian sequences. The present study provides information that could be useful in the development of an additional molecular marker for garlic authentication and quality control.

  13. Population clustering based on copy number variations detected from next generation sequencing data

    PubMed Central

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2015-01-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering. PMID:25152046

  14. Cis-regulatory sequence variation and association with Mycoplasma load in natural populations of the house finch (Carpodacus mexicanus)

    PubMed Central

    Backström, Niclas; Shipilina, Daria; Blom, Mozes P K; Edwards, Scott V

    2013-01-01

    Characterization of the genetic basis of fitness traits in natural populations is important for understanding how organisms adapt to the changing environment and to novel events, such as epizootics. However, candidate fitness-influencing loci, such as regulatory regions, are usually unavailable in nonmodel species. Here, we analyze sequence data from targeted resequencing of the cis-regulatory regions of three candidate genes for disease resistance (CD74, HSP90α, and LCP1) in populations of the house finch (Carpodacus mexicanus) historically exposed (Alabama) and naïve (Arizona) to Mycoplasma gallisepticum. Our study, the first to quantify variation in regulatory regions in wild birds, reveals that the upstream regions of CD74 and HSP90α are GC-rich, with the former exhibiting unusually low sequence variation for this species. We identified two SNPs, located in a GC-rich region immediately upstream of an inferred promoter site in the gene HSP90α, that were significantly associated with Mycoplasma pathogen load in the two populations. The SNPs are closely linked and situated in potential regulatory sequences: one in a binding site for the transcription factor nuclear NFYα and the other in a dinucleotide microsatellite ((GC)6). The genotype associated with pathogen load in the putative NFYα binding site was significantly overrepresented in the Alabama birds. However, we did not see strong effects of selection at this SNP, perhaps because selection has acted on standing genetic variation over an extremely short time in a highly recombining region. Our study is a useful starting point to explore functional relationships between sequence polymorphisms, gene expression, and phenotypic traits, such as pathogen resistance that affect fitness in the wild. PMID:23532859

  15. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    PubMed Central

    2013-01-01

    Background Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Results Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (< 50 bp) of IAPV. These coverage gaps occurred across sequencing runs and were virtually unchanged when reads were re-mapped with greater permissiveness (up to 8% divergence), suggesting a recurrent sequencing artifact rather than strain divergence. Consensus sequences of DWV for each sample showed little phylogenetic divergence, low nucleotide diversity, and strongly negative values of Fu and Li’s D statistic, suggesting a recent population bottleneck and/or purifying selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li’s D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low

  16. High-throughput sequencing and vaccine design.

    PubMed

    Luciani, F

    2016-04-01

    Next-generation sequencing (NGS) technologies have reshaped genome research. The resulting increase in sequencing depth and resolution has led to an unprecedented level of genomic detail and thus an increasing awareness of the complexity of animal, human and pathogen genomes. This has resulted in new approaches to vaccine research. On the one hand, the increase in genome complexity challenges our ability to study and understand pathogen biology and pathogen-host interactions. On the other hand, the increase in genomic data also provides key information for developing and designing improved vaccines against pathogens that were previously extremely difficult to deal with, such as rapidly mutating RNA viruses or bacteria that have complex interactions with the host immune system. This review describes how the broad application of NGS technologies to genome research is affecting vaccine research. It focuses on implications for the field of viral genomics, and includes recent animal and human studies.

  17. High throughput system for DNA sequencing

    NASA Astrophysics Data System (ADS)

    El-Difrawy, Sameh A.; Lam, Roger; Aborn, James H.; Novotny, Mark; Gismondi, Elizabeth A.; Matsudaira, Paul; Mckenna, Brian K.; O'Neil, Thomas; Streechon, Philip; Ehrlich, Daniel J.

    2005-07-01

    A 768-lane DNA sequencing system based on micromachined plates has been designed as a near-term successor to 96-lane capillary arrays. Electrophoretic separations are implemented in large-format (25cm by 50cm) microfabricated devices with the objective of proving realistic read length, parallelism, and the scaled sample requirements for long-read de novo sequencing. Two 384-lane plates are alternatively cycled between electrophoresis and regeneration via a robotic pipettor and switching optical system. The instrument minimizes the DNA sample requirement to "1/32×" Sanger chemistry, equal to typical genome center operation, and a 16-fold improvement in scaling relative to previous microfabricated devices. The 40-cm-long channels permit an increase in read length (several hundred base pairs) relative to previous multichannel microfabricated devices. These advances directly address the cost and automation model for adaptation of the technology.

  18. Estimating Protistan Diversity Using High-Throughput Sequencing.

    PubMed

    Hu, Sarah K; Liu, Zhenfeng; Lie, Alle A Y; Countway, Peter D; Kim, Diane Y; Jones, Adriane C; Gast, Rebecca J; Cary, S Craig; Sherr, Evelyn B; Sherr, Barry F; Caron, David A

    2015-01-01

    Sequencing hypervariable regions from the 18S rRNA gene is commonly employed to characterize protistan biodiversity, yet there are concerns that short reads do not provide the same taxonomic resolution as full-length sequences. A total of 7,432 full-length sequences were used to perform an in silico analysis of how sequences of various lengths and target regions impact downstream ecological interpretations. Sequences that were longer than 400 nucleotides and included the V4 hypervariable region generated results similar to those derived from full-length 18S rRNA gene sequences. Present high-throughput sequencing capabilities are approaching protistan diversity estimation comparable to whole gene sequences.

  19. Minifish shows high genetic variation in mtDNA size.

    PubMed

    Chen, X-W; Li, Q-L; Hu, X-J; Yuan, Y-M; Wen, M; Peng, L-Y; Liu, S-J; Hong, Y-H

    2014-01-01

    The genus Paedocypris is a newly described taxon of minifish species that are characterized by extensive chromosome evolution and one of the smallest known vertebrate nuclear genomes. Paedocypris features a tiny adult size, a short generation time, low fecundity and fragmented tropical habitats, which are factors that favor rapid speciation. Most recently, we have revealed that P. progenetica (Pp), the type species of the genus Paedocypris, has an unusual mtDNA bearing - within its D-loop - a tandem array of a 34-bp repeat sequence called the minifish repeat, which shows compromised replication efficiency in vitro. Here we report that Pp exhibits high genetic variation in mtDNA size. The efficiency of D-loop amplification was found to depend upon primers. Interestingly, Pp individuals of one and the same population differed drastically in mtDNA size resulting from varying copy numbers of the minifish repeat. We conclude that minifish has a high mutation rate and perhaps represents a rapidly evolving taxon of vertebrates.

  20. Complete plastid genome sequence of Primula sinensis (Primulaceae): structure comparison, sequence variation and evidence for accD transfer to nucleus.

    PubMed

    Liu, Tong-Jian; Zhang, Cai-Yun; Yan, Hai-Fei; Zhang, Lu; Ge, Xue-Jun; Hao, Gang

    2016-01-01

    Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36-rps8, rps16-trnQ, trnH-psbA and ndhC-trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.

  1. Complete plastid genome sequence of Primula sinensis (Primulaceae): structure comparison, sequence variation and evidence for accD transfer to nucleus

    PubMed Central

    Liu, Tong-Jian; Zhang, Cai-Yun; Yan, Hai-Fei; Zhang, Lu

    2016-01-01

    Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp) were separated by a large single-copy region (82,064 bp) and a small single-copy region (17,725 bp). The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF) were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis. PMID:27375965

  2. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    PubMed

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  3. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma

    PubMed Central

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-01-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  4. Capturing genomic signatures of DNA sequence variation using a standard anonymous microarray platform

    PubMed Central

    Cannon, C. H.; Kua, C. S.; Lobenhofer, E. K.; Hurban, P.

    2006-01-01

    Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic ‘indicator’ probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed. PMID:17000641

  5. Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing

    PubMed Central

    Beitzel, Brett; Chain, Patrick S. G.; Davenport, Matthew G.; Donaldson, Eric; Frieman, Matthew; Kugelman, Jeffrey; Kuhn, Jens H.; O’Rear, Jules; Sabeti, Pardis C.; Wentworth, David E.; Wiley, Michael R.; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher

    2014-01-01

    ABSTRACT Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques. PMID:24939889

  6. Standards for sequencing viral genomes in the era of high-throughput sequencing.

    PubMed

    Ladner, Jason T; Beitzel, Brett; Chain, Patrick S G; Davenport, Matthew G; Donaldson, Eric F; Frieman, Matthew; Kugelman, Jeffrey R; Kuhn, Jens H; O'Rear, Jules; Sabeti, Pardis C; Wentworth, David E; Wiley, Michael R; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher; Palacios, Gustavo

    2014-01-01

    Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five "standard" categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.

  7. A quantitative trait locus for variation in dopamine metabolism mapped in a primate model using reference sequences from related species

    PubMed Central

    Freimer, Nelson B.; Service, Susan K.; Ophoff, Roel A.; Jasinska, Anna J.; McKee, Kevin; Villeneuve, Amelie; Belisle, Alexandre; Bailey, Julia N.; Breidenthal, Sherry E.; Jorgensen, Matthew J.; Mann, J. John; Cantor, Rita M.; Dewar, Ken; Fairbanks, Lynn A.

    2007-01-01

    Non-human primates (NHP) provide crucial research models. Their strong similarities to humans make them particularly valuable for understanding complex behavioral traits and brain structure and function. We report here the genetic mapping of an NHP nervous system biologic trait, the cerebrospinal fluid (CSF) concentration of the dopamine metabolite homovanillic acid (HVA), in an extended inbred vervet monkey (Chlorocebus aethiops sabaeus) pedigree. CSF HVA is an index of CNS dopamine activity, which is hypothesized to contribute substantially to behavioral variations in NHP and humans. For quantitative trait locus (QTL) mapping, we carried out a two-stage procedure. We first scanned the genome using a first-generation genetic map of short tandem repeat markers. Subsequently, using >100 SNPs within the most promising region identified by the genome scan, we mapped a QTL for CSF HVA at a genome-wide level of significance (peak logarithm of odds score >4) to a narrow well delineated interval (<10 Mb). The SNP discovery exploited conserved segments between human and rhesus macaque reference genome sequences. Our findings demonstrate the potential of using existing primate reference genome sequences for designing high-resolution genetic analyses applicable across a wide range of NHP species, including the many for which full genome sequences are not yet available. Leveraging genomic information from sequenced to nonsequenced species should enable the utilization of the full range of NHP diversity in behavior and disease susceptibility to determine the genetic basis of specific biological and behavioral traits. PMID:17884980

  8. Clinical Interpretation of Variants from Next-Generation Sequencing: The 2016 Scientific Meeting of the Human Genome Variation Society.

    PubMed

    Oetting, William S; Brookes, Anthony J; Béroud, Christophe; Taschner, Peter E

    2016-10-01

    The 2016 scientific meeting of the Human Genome Variation Society (HGVS; http://www.hgvs.org) was held on the 20th of May in Barcelona, Spain, with the theme of "Clinical Interpretation of Variants from Next-Generation Sequencing."

  9. Management of High-Throughput DNA Sequencing Projects: Alpheus.

    PubMed

    Miller, Neil A; Kingsmore, Stephen F; Farmer, Andrew; Langley, Raymond J; Mudge, Joann; Crow, John A; Gonzalez, Alvaro J; Schilkey, Faye D; Kim, Ryan J; van Velkinburgh, Jennifer; May, Gregory D; Black, C Forrest; Myers, M Kathy; Utsey, John P; Frost, Nicholas S; Sugarbaker, David J; Bueno, Raphael; Gullans, Stephen R; Baxter, Susan M; Day, Steve W; Retzel, Ernest F

    2008-12-26

    High-throughput DNA sequencing has enabled systems biology to begin to address areas in health, agricultural and basic biological research. Concomitant with the opportunities is an absolute necessity to manage significant volumes of high-dimensional and inter-related data and analysis. Alpheus is an analysis pipeline, database and visualization software for use with massively parallel DNA sequencing technologies that feature multi-gigabase throughput characterized by relatively short reads, such as Illumina-Solexa (sequencing-by-synthesis), Roche-454 (pyrosequencing) and Applied Biosystem's SOLiD (sequencing-by-ligation). Alpheus enables alignment to reference sequence(s), detection of variants and enumeration of sequence abundance, including expression levels in transcriptome sequence. Alpheus is able to detect several types of variants, including non-synonymous and synonymous single nucleotide polymorphisms (SNPs), insertions/deletions (indels), premature stop codons, and splice isoforms. Variant detection is aided by the ability to filter variant calls based on consistency, expected allele frequency, sequence quality, coverage, and variant type in order to minimize false positives while maximizing the identification of true positives. Alpheus also enables comparisons of genes with variants between cases and controls or bulk segregant pools. Sequence-based differential expression comparisons can be developed, with data export to SAS JMP Genomics for statistical analysis. PMID:20151039

  10. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA.

    PubMed

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-10-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA. Here, we present a novel and highly sensitive method for targeted sequencing of ancient and degraded DNA, which couples multiplex PCR directly with sample barcoding and high-throughput sequencing. Using this approach, we obtained a 96% complete mitochondrial genome data set from 31 cave bear (Ursus spelaeus) samples using only two 454 Life Sciences (Roche) GS FLX runs. In contrast to previous studies relying only on short sequence fragments, the overlapping portion of our data comprises almost 10 kb of replicated mitochondrial genome sequence, allowing for the unambiguous differentiation of three major cave bear clades. Our method opens up the opportunity to simultaneously generate many kilobases of overlapping sequence data from large sets of difficult samples, such as museum specimens, medical collections, or forensic samples. Embedded in our approach, we present a new protocol for the construction of barcoded sequencing libraries, which is compatible with all current high-throughput technologies and can be performed entirely in plate setup.

  11. Direct multiplex sequencing (DMPS)--a novel method for targeted high-throughput sequencing of ancient and highly degraded DNA.

    PubMed

    Stiller, Mathias; Knapp, Michael; Stenzel, Udo; Hofreiter, Michael; Meyer, Matthias

    2009-10-01

    Although the emergence of high-throughput sequencing technologies has enabled whole-genome sequencing from extinct organisms, little progress has been made in accelerating targeted sequencing from highly degraded DNA. Here, we present a novel and highly sensitive method for targeted sequencing of ancient and degraded DNA, which couples multiplex PCR directly with sample barcoding and high-throughput sequencing. Using this approach, we obtained a 96% complete mitochondrial genome data set from 31 cave bear (Ursus spelaeus) samples using only two 454 Life Sciences (Roche) GS FLX runs. In contrast to previous studies relying only on short sequence fragments, the overlapping portion of our data comprises almost 10 kb of replicated mitochondrial genome sequence, allowing for the unambiguous differentiation of three major cave bear clades. Our method opens up the opportunity to simultaneously generate many kilobases of overlapping sequence data from large sets of difficult samples, such as museum specimens, medical collections, or forensic samples. Embedded in our approach, we present a new protocol for the construction of barcoded sequencing libraries, which is compatible with all current high-throughput technologies and can be performed entirely in plate setup. PMID:19635845

  12. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera.

    PubMed

    Wallberg, Andreas; Han, Fan; Wellhagen, Gustaf; Dahle, Bjørn; Kawata, Masakado; Haddad, Nizar; Simões, Zilá Luz Paulino; Allsopp, Mike H; Kandemir, Irfan; De la Rúa, Pilar; Pirk, Christian W; Webster, Matthew T

    2014-10-01

    The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system- and sperm motility-related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees. PMID:25151355

  13. Three-Year Sequence for High School Mathematics, Course III.

    ERIC Educational Resources Information Center

    Buchman, Paul; Buchman, Aaron

    This curriculum guide covers Course III of a three-year sequence for high school mathematics in New York which was intended to provide an alternative to the regular Regents sequence of ninth-, tenth-, and eleventh-grade mathematics. A listing of scope and content is provided along with suggested time allotments. Topics covered are: complex…

  14. The sequence of learning cycle activities in high school chemistry

    NASA Astrophysics Data System (ADS)

    Abraham, Michael R.; Renner, John W.

    The sequence of the three phases of two high school learning cycles in chemistry was altered in order to: (I ) give insights into the factors which account for the success of the learning cycle, (2) serve as an indirect test of the association between Piaget's theory and the learning cycle, and (3) to compare the learning cycle with traditional instruction. Each of the six sequences (one n o d and five altered) was studied with content and atritudc measures. The outcomes of the study supported the contention that the normal learning cycle sequence is the optimum sequence for achievement of content knowledge.

  15. High-speed multiple sequence alignment on a reconfigurable platform.

    PubMed

    Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf

    2006-01-01

    Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.

  16. Variational formulation of high-performance finite elements - Parametrized variational principles

    NASA Technical Reports Server (NTRS)

    Felippa, C. A.; Militello, C.

    1990-01-01

    High-performance (HP) elements are simple finite elements constructed to deliver engineering accuracy with coarse arbitrary grids. This paper is part of a series on the variational basis of HP elements, with emphasis on those constructed with the free formulation (FF) and assumed natural strain (ANS) methods. The present paper studies parametrized variational principles that provide a foundation for the FF and ANS methods, as well as for a combustion of both methods.

  17. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  18. Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing.

    PubMed

    Sahoo, Malaya K; Tan, Susanna K; Chen, Sharon F; Kapusinszky, Beatrix; Concepcion, Katherine R; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C; Anderson, Matthew W; Concepcion, Waldo; Pinsky, Benjamin A

    2015-10-01

    BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies.

  19. Limited Variation in BK Virus T-Cell Epitopes Revealed by Next-Generation Sequencing

    PubMed Central

    Sahoo, Malaya K.; Tan, Susanna K.; Chen, Sharon F.; Kapusinszky, Beatrix; Concepcion, Katherine R.; Kjelson, Lynn; Mallempati, Kalyan; Farina, Heidi M.; Fernández-Viña, Marcelo; Tyan, Dolly; Grimm, Paul C.; Anderson, Matthew W.; Concepcion, Waldo

    2015-01-01

    BK virus (BKV) infection causing end-organ disease remains a formidable challenge to the hematopoietic cell transplant (HCT) and kidney transplant fields. As BKV-specific treatments are limited, immunologic-based therapies may be a promising and novel therapeutic option for transplant recipients with persistent BKV infection. Here, we describe a whole-genome, deep-sequencing methodology and bioinformatics pipeline that identify BKV variants across the genome and at BKV-specific HLA-A2-, HLA-B0702-, and HLA-B08-restricted CD8 T-cell epitopes. BKV whole genomes were amplified using long-range PCR with four inverse primer sets, and fragmentation libraries were sequenced on the Ion Torrent Personal Genome Machine (PGM). An error model and variant-calling algorithm were developed to accurately identify rare variants. A total of 65 samples from 18 pediatric HCT and kidney recipients with quantifiable BKV DNAemia underwent whole-genome sequencing. Limited genetic variation was observed. The median number of amino acid variants identified per sample was 8 (range, 2 to 37; interquartile range, 10), with the majority of variants (77%) detected at a frequency of <5%. When normalized for length, there was no statistical difference in the median number of variants across all genes. Similarly, the predominant virus population within samples harbored T-cell epitopes similar to the reference BKV strain that was matched for the BKV genotype. Despite the conservation of epitopes, low-level variants in T-cell epitopes were detected in 77.7% (14/18) of patients. Understanding epitope variation across the whole genome provides insight into the virus-immune interface and may help guide the development of protocols for novel immunologic-based therapies. PMID:26202116

  20. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    PubMed Central

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  1. Deleted copy number variation of Hanwoo and Holstein using next generation sequencing at the population level

    PubMed Central

    2014-01-01

    Background Copy number variation (CNV), a source of genetic diversity in mammals, has been shown to underlie biological functions related to production traits. Notwithstanding, there have been few studies conducted on CNVs using next generation sequencing at the population level. Results Illumina NGS data was obtained for ten Holsteins, a dairy cattle, and 22 Hanwoo, a beef cattle. The sequence data for each of the 32 animals varied from 13.58-fold to almost 20-fold coverage. We detected a total of 6,811 deleted CNVs across the analyzed individuals (average length = 2732.2 bp) corresponding to 0.74% of the cattle genome (18.6 Mbp of variable sequence). By examining the overlap between CNV deletion regions and genes, we selected 30 genes with the highest deletion scores. These genes were found to be related to the nervous system, more specifically with nervous transmission, neuron motion, and neurogenesis. We regarded these genes as having been effected by the domestication process. Further analysis of the CNV genotyping information revealed 94 putative selected CNVs and 954 breed-specific CNVs. Conclusions This study provides useful information for assessing the impact of CNVs on cattle traits using NGS at the population level. PMID:24673797

  2. Cytochrome Oxidase I (COI) sequence conservation and variation patterns in the yellowfin and longtail tunas.

    PubMed

    Kunal, Swaraj Priyaranjan; Kumar, Girish

    2013-01-01

    Tunas are commercially important fishery worldwide. There are at least 13 species of tuna belonging to three genera, out of which genus Thunnus has maximum eight species. On the basis of their availability, they can be characterised as oceanic such as Thunnus albacares (yellowfin tuna) or coastal such as Thunnus tonggol (longtail tuna). Although these two are different species, morphological differentiation can only be seen in mature individuals, hence misidentification may result in erroneous data set, which ultimately affect conservation strategies. The mitochondrial DNA cytochrome oxidase c subunit 1 (COI) gene is one of the most popular markers for population genetic and phylogeographic studies across the animal kingdom. The present study aims to study the sequence conservation and variation in mitochondrial Cytochrome Oxidase I (COI) between these two species of tuna. COI sequence analysis of yellowfin and longtail revealed the close relationship between them in Thunnus genera. The present study is the first direct comparison of mitochondrial COI sequences of these two tuna species. PMID:23649742

  3. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    PubMed

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-07-07

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches.

  4. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association

    PubMed Central

    Strachan, Norval J. C.; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J.; Hanson, Mary F.; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H. A. M.; French, Nigel P.; George, Tessy; Biggs, Patrick J.; Forbes, Ken J.

    2015-01-01

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei’s genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei’s genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world. PMID:26442781

  5. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association.

    PubMed

    Strachan, Norval J C; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J; Hanson, Mary F; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H A M; French, Nigel P; George, Tessy; Biggs, Patrick J; Forbes, Ken J

    2015-10-07

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei's genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei's genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world.

  6. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  7. Fad7 gene identification and fatty acids phenotypic variation in an olive collection by EcoTILLING and sequencing approaches.

    PubMed

    Sabetta, Wilma; Blanco, Antonio; Zelasco, Samanta; Lombardo, Luca; Perri, Enzo; Mangini, Giacomo; Montemurro, Cinzia

    2013-08-01

    The ω-3 fatty acid desaturases (FADs) are enzymes responsible for catalyzing the conversion of linoleic acid to α-linolenic acid localized in the plastid or in the endoplasmic reticulum. In this research we report the genotypic and phenotypic variation of Italian Olea europaea L. germoplasm for the fatty acid composition. The phenotypic oil characterization was followed by the molecular analysis of the plastidial-type ω-3 FAD gene (fad7) (EC 1.14.19), whose full-length sequence has been here identified in cultivar Leccino. The gene consisted of 2635 bp with 8 exons and 5'- and 3'-UTRs of 336 and 282 bp respectively, and showed a high level of heterozygousity (1/110 bp). The natural allelic variation was investigated both by a LiCOR EcoTILLING assay and the PCR product direct sequencing. Only three haplotypes were identified among the 96 analysed cultivars, highlighting the strong degree of conservation of this gene. PMID:23685785

  8. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  9. Global sequence variation in the histidine-rich proteins 2 and 3 of Plasmodium falciparum: implications for the performance of malaria rapid diagnostic tests

    PubMed Central

    2010-01-01

    Background Accurate diagnosis is essential for prompt and appropriate treatment of malaria. While rapid diagnostic tests (RDTs) offer great potential to improve malaria diagnosis, the sensitivity of RDTs has been reported to be highly variable. One possible factor contributing to variable test performance is the diversity of parasite antigens. This is of particular concern for Plasmodium falciparum histidine-rich protein 2 (PfHRP2)-detecting RDTs since PfHRP2 has been reported to be highly variable in isolates of the Asia-Pacific region. Methods The pfhrp2 exon 2 fragment from 458 isolates of P. falciparum collected from 38 countries was amplified and sequenced. For a subset of 80 isolates, the exon 2 fragment of histidine-rich protein 3 (pfhrp3) was also amplified and sequenced. DNA sequence and statistical analysis of the variation observed in these genes was conducted. The potential impact of the pfhrp2 variation on RDT detection rates was examined by analysing the relationship between sequence characteristics of this gene and the results of the WHO product testing of malaria RDTs: Round 1 (2008), for 34 PfHRP2-detecting RDTs. Results Sequence analysis revealed extensive variations in the number and arrangement of various repeats encoded by the genes in parasite populations world-wide. However, no statistically robust correlation between gene structure and RDT detection rate for P. falciparum parasites at 200 parasites per microlitre was identified. Conclusions The results suggest that despite extreme sequence variation, diversity of PfHRP2 does not appear to be a major cause of RDT sensitivity variation. PMID:20470441

  10. [Genetic variation of Manchurian pheasant (Phasianus colchicus pallasi Rotshild, 1903) inferred from mitochondrial DNA control region sequences].

    PubMed

    Kozyrenko, M M; Fisenko, P V; Zhuravlev, Iu N

    2009-04-01

    Sequence variation of the mitochondrial DNA control region was studied in Manchurian pheasants (Phasianus colchicus pallasi Rotshild, 1903) representing three geographic populations from the southern part of the Russian Far East. Extremely low population genetic differentiation (F(ST) = 0.0003) pointed to a very high gene exchange between the populations. Combination of such characters as high haplotype diversity (0.884 to 0.913), low nucleotide diversity (0.0016 to 0.0022), low R2 values (0.1235 to 0.1337), certain patterns of pairwise-difference distributions, and the absence of phylogenetic structure suggested that the phylogenetic history of Ph. C. pallasi included passing through a bottleneck with further expansion in the postglacial period. According to the data obtained, it was suggested that differentiation between the mitochondrial lineages started approximately 100 000 years ago.

  11. Polarimetric Variations of Binary Stars. V. Pre-Main-Sequence Spectroscopic Binaries Located in Ophiuchus and Scorpius

    NASA Astrophysics Data System (ADS)

    Manset, N.; Bastien, P.

    2003-06-01

    We present polarimetric observations of seven pre-main-sequence (PMS) spectroscopic binaries located in the ρ Ophiuchus and Upper Scorpius star-forming regions (SFRs). The average observed polarizations at 7660 Å are between 0.5% and 3.5%. After estimates of the interstellar polarization are removed, all binaries have an intrinsic polarization above 0.4%, even though most of them do not present other evidences for circumstellar dust. Two binaries, NTTS 162814-2427 and NTTS 162819-2423S, present high levels of intrinsic polarization between 1.5% and 2.1%, in agreement with the fact that other observations (photometry, spectroscopy) indicate the presence of circumstellar dust. Tests reveal that all seven PMS binaries have a statistically variable or possibly variable polarization. Combining these results with our previous sample of binaries located in the Taurus, Auriga, and Orion SFRs, 68% of the binaries have an intrinsic polarization above 0.5%, and 90% of the binaries are polarimetrically variable or possibly variable. NTTS 160814-1857, 162814-2427, and 162819-2423S are clearly polarimetrically variable. The first two also exhibit phase-locked variations over ~10 and ~40 orbits, respectively. Statistically, NTTS 160905-1859 is possibly variable, but it shows periodic variations not detected by the statistical tests; those variations are not phased locked and only present for short intervals of time. The amplitudes of the variations reach a few tenths of a percent, greater than for the previously studied PMS binaries located in the Taurus, Orion, and Auriga SFRs. The high-eccentricity system NTTS 162814-2427 shows single-periodic variations, in agreement with our previous numerical simulations. We compare the observations with some of our numerical simulations and also show that an analysis of the periodic polarimetric variations with the Brown, McLean, & Emslie (BME) formalism to find the orbital inclination is for the moment premature: nonperiodic events

  12. Atomic force microscopy of crystalline insulins: the influence of sequence variation on crystallization and interfacial structure.

    PubMed Central

    Yip, C M; Brader, M L; DeFelippis, M R; Ward, M D

    1998-01-01

    The self-association of proteins is influenced by amino acid sequence, molecular conformation, and the presence of molecular additives. In the presence of phenolic additives, LysB28ProB29 insulin, in which the C-terminal prolyl and lysyl residues of wild-type human insulin have been inverted, can be crystallized into forms resembling those of wild-type insulins in which the protein exists as zinc-complexed hexamers organized into well-defined layers. We describe herein tapping-mode atomic force microscopy (TMAFM) studies of single crystals of rhombohedral (R3) LysB28ProB29 that reveal the influence of sequence variation on hexamer-hexamer association at the surface of actively growing crystals. Molecular scale lattice images of these crystals were acquired in situ under growth conditions, enabling simultaneous identification of the rhombohedral LysB28ProB29 crystal form, its orientation, and its dynamic growth characteristics. The ability to obtain crystallographic parameters on multiple crystal faces with TMAFM confirmed that bovine and porcine insulins grown under these conditions crystallized into the same space group as LysB28ProB29 (R3), enabling direct comparison of crystal growth behavior and the influence of sequence variation. Real-time TMAFM revealed hexamer vacancies on the (001) terraces of LysB28ProB29, and more rounded dislocation noses and larger terrace widths for actively growing screw dislocations compared to wild-type bovine and porcine insulin crystals under identical conditions. This behavior is consistent with weaker interhexamer attachment energies for LysB28ProB29 at active growth sites. Comparison of the single crystal x-ray structures of wild-type insulins and LysB28ProB29 suggests that differences in protein conformation at the hexamer-hexamer interface and accompanying changes in interhexamer bonding are responsible for this behavior. These studies demonstrate that subtle changes in molecular conformation due to a single sequence

  13. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer.

    PubMed

    Timofeeva, Maria N; Kinnersley, Ben; Farrington, Susan M; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J; Harris, Sarah E; Northwood, Emma L; Barrett, Jennifer H; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G; Houlston, Richard S

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10(-7)), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10(-7)); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10(-7) and OR = 1.09, P = 7.4 × 10(-8)); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10(-9)), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10(-6)). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10(-4)) and DNA mismatch repair genes (P = 6.1 × 10(-4)) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  14. Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

    PubMed Central

    Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

    2015-01-01

    Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438

  15. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants

    PubMed Central

    Gundry, Michael; Vijg, Jan

    2011-01-01

    DNA mutations are the source of genetic variation within populations. The majority of mutations with observable effects are deleterious. In humans mutations in the germ line can cause genetic disease. In somatic cells multiple rounds of mutations and selection lead to cancer. The study of genetic variation has progressed rapidly since the completion of the draft sequence of the human genome. Recent advances in sequencing technology, most importantly the introduction of massively parallel sequencing (MPS), have resulted in more than a hundred-fold reduction in the time and cost required for sequencing nucleic acids. These improvements have greatly expanded the use of sequencing as a practical tool for mutation analysis. While in the past the high cost of sequencing limited mutation analysis to selectable markers or small forward mutation targets assumed to be representative for the genome overall, current platforms allow whole genome sequencing for less than $5,000. This has already given rise to direct estimates of germline mutation rates in multiple organisms including humans by comparing whole genome sequences between parents and offspring. Here we present a brief history of the field of mutation research, with a focus on classical tools for the measurement of mutation rates. We then review MPS, how it is currently applied and the new insight into human and animal mutation frequencies and spectra that has been obtained from whole genome sequencing. While great progress has been made, we note that the single most important limitation of current MPS approaches for mutation analysis is the inability to address low-abundance mutations that turn somatic tissues into mosaics of cells. Such mutations are at the basis of intra-tumor heterogeneity, with important implications for clinical diagnosis, and could also contribute to somatic diseases other than cancer, including aging. Some possible approaches to gain access to low-abundance mutations are discussed, with a

  16. Association Between Absolute Neutrophil Count and Variation at TCIRG1: The NHLBI Exome Sequencing Project.

    PubMed

    Rosenthal, Elisabeth A; Makaryan, Vahagn; Burt, Amber A; Crosslin, David R; Kim, Daniel Seung; Smith, Joshua D; Nickerson, Deborah A; Reiner, Alex P; Rich, Stephen S; Jackson, Rebecca D; Ganesh, Santhi K; Polfus, Linda M; Qi, Lihong; Dale, David C; Jarvik, Gail P

    2016-09-01

    Neutrophils are a key component of innate immunity. Individuals with low neutrophil count are susceptible to frequent infections. Linkage and association between congenital neutropenia and a single rare missense variant in TCIRG1 have been reported in a single family. Here, we report on nine rare missense variants at evolutionarily conserved sites in TCIRG1 that are associated with lower absolute neutrophil count (ANC; p = 0.005) in 1,058 participants from three cohorts: Atherosclerosis Risk in Communities (ARIC), Coronary Artery Risk Development in Young Adults (CARDIA), and Jackson Heart Study (JHS) of the NHLBI Grand Opportunity Exome Sequencing Project (GO ESP). These results validate the effects of TCIRG1 coding variation on ANC and suggest that this gene may be associated with a spectrum of mild to severe effects on ANC. PMID:27229898

  17. Plant simple sequence repeats: distribution, variation, and effects on gene expression.

    PubMed

    Sharopova, Natalya

    2008-02-01

    Genome-wide simple sequence repeat (SSR) information was analyzed together with functional annotations of Arabidopsis genes and public gene expression data for Arabidopsis and rice. Analysis of more than 15,000 Arabidopsis and more than 16,000 rice SSRs indicated that SSRs may affect the expression of hundreds of genes. Data from experiments on DNA methylation, histone acetylation, and transcript turnover suggest that SSRs may affect gene expression at transcriptional and posttranscriptional levels. Members of some functional groups were shown to be enriched with SSRs and often contained similar but non-homologous repeats within the same gene regions. In addition, the distribution of perfect and imperfect SSRs in some Arabidopsis, maize, and rice genes was used to demonstrate how two-level control of SSR variation may contribute to protein evolution.

  18. A cladistic model of ACE sequence variation with implications for myocardial infarction, Alzheimer disease and obesity.

    PubMed

    Katzov, Hagit; Bennet, Anna M; Kehoe, Patrick; Wiman, Björn; Gatz, Margaret; Blennow, Kaj; Lenhard, Boris; Pedersen, Nancy L; de Faire, Ulf; Prince, Jonathan A

    2004-11-01

    Sequence variation in ACE, which encodes angiotensin I converting enzyme, contributes to a large proportion of variability in plasma ACE levels, but the extent to which this impacts upon human disease is unresolved. Most efforts to associate ACE with other heritable traits have involved a single Alu insertion/deletion polymorphism, despite the probable existence of other functional sequence variants with effects that may not be consistently detectable by solely typing the Alu indel. Here, utilizing single nucleotide polymorphisms (SNPs) that differentiate major ACE clades in European populations, we demonstrate a number of significant phenotype associations across more than 4000 Swedish individuals. In a systematic analysis of metabolic phenotypes, effects were detected upon several traits, including fasting plasma glucose levels, insulin levels and measures of obesity (P-values ranging from 0.046 to 8.4 x 10(-6)). Extending cladistic models to the study of myocardial infarction and Alzheimer disease, significant associations were observed with greater effect sizes than those typically obtained in large-scale meta-analyses based on the Alu indel. Population frequencies of ACE genotypes were also found to change with age, congruent with previous data suggesting effects upon longevity. Clade models consistently outperformed those based upon single markers, reinforcing the importance of taking into consideration the possible confounding effects of allelic heterogeneity in this genomic region. Utilizing computational tools, potential functional variants are highlighted that may underlie phenotypic variability, which is discussed along with the broader implications these results may have for studies attempting to link variation in ACE to human disease.

  19. Solution of high frequency variations of ERP from VLBI observations

    NASA Astrophysics Data System (ADS)

    Zhang, B.; Li, J. L.; Wang, G. L.; Zhao, M.

    2005-01-01

    In the astrometric and geodetic VLBI data analysis software CALC/SOLVE, the high frequency variations of the Earth Rotation Parameters (ERP) are determined by a constrained continuous piecewise linear model. The ERP rate within two epoch nodes is constrained to be smaller than a limitation setting, and the ERP is forced to be continuous at epoch nodes. Observation analysis shows that when the data points are not very dense the constraint and the continuation requirement are helpful to the improvement in the stability of the solution, but degrade the independence of ERP solutions at epoch nodes as well. By using the Userpartial entry of CALC/SOLVE a direct solution module of the high frequency variations of ERP is realized without any constraint on the rate nor the requirement of continuation at nodes. It is shown from real observation reduction that the direct solution mode is feasible. In the solution of high frequency variations of ERP from VLBI observations with long period coverage, the model errors of the precession and nutation (celestial pole offset) should be taken into consideration. A corresponding module is realized and global solutions of the high frequency variation of ERP are successfully performed on the VLBI observations from 1979 to 2003. Comparison of the solutions shows that with the consideration of the pole offsets the precision of parameters could be improved obviously. In the solution of high frequency variation of ERP from VLBI observations, the direct solution mode with the consideration of the pole offsets is accordingly recommended.

  20. Patchwork sequencing of tomato San Marzano and Vesuviano varieties highlights genome-wide variations

    PubMed Central

    2014-01-01

    Background Investigation of tomato genetic resources is a crucial issue for better straight evolution and genetic studies as well as tomato breeding strategies. Traditional Vesuviano and San Marzano varieties grown in Campania region (Southern Italy) are famous for their remarkable fruit quality. Owing to their economic and social importance is crucial to understand the genetic basis of their unique traits. Results Here, we present the draft genome sequences of tomato Vesuviano and San Marzano genome. A 40x genome coverage was obtained from a hybrid Illumina paired-end reads assembling that combines de novo assembly with iterative mapping to the reference S. lycopersicum genome (SL2.40). Insertions, deletions and SNP variants were carefully measured. When assessed on the basis of the reference annotation, 30% of protein-coding genes are predicted to have variants in both varieties. Copy genes number and gene location were assessed by mRNA transcripts mapping, showing a closer relationship of San Marzano with reference genome. Distinctive variations in key genes and transcription/regulation factors related to fruit quality have been revealed for both cultivars. Conclusions The effort performed highlighted varieties relationships and important variants in fruit key processes useful to dissect the path from sequence variant to phenotype. PMID:24548308

  1. Serine Hydroxymethyltransferase 1 and 2: Gene Sequence Variation and Functional Genomic Characterization

    PubMed Central

    Hebbring, Scott J.; Chai, Yubo; Ji, Yuan; Abo, Ryan P.; Jenkins, Gregory D.; Fridley, Brooke; Zhang, Jianping; Eckloff, Bruce W.; Wieben, Eric D.; Weinshilboum, Richard M.

    2012-01-01

    Serine hydroxymethyltransferase (SHMT) catalyzes the transfer of a beta carbon from serine to tetrahydrofolate (THF) to form glycine and 5,10-methylene-THF. This reaction plays an important role in neurotransmitter synthesis and metabolism. We set out to resequence SHMT1 and SHMT2, followed by functional genomic studies. We identified 87 and 60 polymorphisms in SHMT1 and SHMT2, respectively. We observed no significant functional effect of the 13 nonsynonymous SNPs in these genes, either on catalytic activity or protein quantity. We imputed additional variants across the two genes using “1000 Genomes” data, and identified 14 variants that were significantly associated (p-value < 1.0E-10) with SHMT1 mRNA expression in lymphoblastoid cell lines. Many of these SNPs were also significantly correlated with basal SHMT1 protein expression in 268 human liver biopsy samples. Reporter gene assays suggested that the SHMT1 promoter SNP, rs669340, contributed to this variation. Finally, SHMT1 and SHMT2 expression were significantly correlated with those of other Folate and Methionine Cycle genes at both the mRNA and protein levels. These experiments represent a comprehensive study of SHMT1 and SHMT2 gene sequence variation and its functional implications. In addition, we obtained preliminary indications that these genes may be co-regulated with other Folate and Methionine Cycle genes. PMID:22220685

  2. Serine hydroxymethyltransferase 1 and 2: gene sequence variation and functional genomic characterization.

    PubMed

    Hebbring, Scott J; Chai, Yubo; Ji, Yuan; Abo, Ryan P; Jenkins, Gregory D; Fridley, Brooke; Zhang, Jianping; Eckloff, Bruce W; Wieben, Eric D; Weinshilboum, Richard M

    2012-03-01

    Serine hydroxymethyltransferase (SHMT) catalyzes the transfer of a β-carbon from serine to tetrahydrofolate to form glycine and 5,10-methylene-tetrahydrofolate. This reaction plays an important role in neurotransmitter synthesis and metabolism. We set out to resequence SHMT1 and SHMT2, followed by functional genomic studies. We identified 87 and 60 polymorphisms in SHMT1 and SHMT2, respectively. We observed no significant functional effect of the 13 non-synonymous single-nucleotide polymorphism (SNPs) in these genes, either on catalytic activity or protein quantity. We imputed additional variants across the two genes using '1000 Genomes' data, and identified 14 variants that were significantly associated (p<1.0E-10) with SHMT1 messenger RNA expression in lymphoblastoid cell lines. Many of these SNPs were also significantly correlated with basal SHMT1 protein expression in 268 human liver biopsy samples. Reporter gene assays suggested that the SHMT1 promoter SNP, rs669340, contributed to this variation. Finally, SHMT1 and SHMT2 expression were significantly correlated with those of other Folate and Methionine Cycle genes at both the messenger RNA and protein levels. These experiments represent a comprehensive study of SHMT1 and SHMT2 gene sequence variation and its functional implications. In addition, we obtained preliminary indications that these genes may be co-regulated with other Folate and Methionine Cycle genes.

  3. Detection and implication of significant temporal b-value variation during earthquake sequences

    NASA Astrophysics Data System (ADS)

    Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan

    2016-04-01

    Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.

  4. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    PubMed

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch.

  5. Spatial and Temporal Stress Drop Variations of the 2011 Tohoku Earthquake Sequence

    NASA Astrophysics Data System (ADS)

    Miyake, H.

    2013-12-01

    The 2011 Tohoku earthquake sequence consists of foreshocks, mainshock, aftershocks, and repeating earthquakes. To quantify spatial and temporal stress drop variations is important for understanding M9-class megathrust earthquakes. Variability and spatial and temporal pattern of stress drop is a basic information for rupture dynamics as well as useful to source modeling. As pointed in the ground motion prediction equations by Campbell and Bozorgnia [2008, Earthquake Spectra], mainshock-aftershock pairs often provide significant decrease of stress drop. We here focus strong motion records before and after the Tohoku earthquake, and analyze source spectral ratios considering azimuth- and distance dependency [Miyake et al., 2001, GRL]. Due to the limitation of station locations on land, spatial and temporal stress drop variations are estimated by adjusting shifts from the omega-squared source spectral model. The adjustment is based on the stochastic Green's function simulations of source spectra considering azimuth- and distance dependency. We assumed the same Green's functions for event pairs for each station, both the propagation path and site amplification effects are cancelled out. Precise studies of spatial and temporal stress drop variations have been performed [e.g., Allmann and Shearer, 2007, JGR], this study targets the relations between stress drop vs. progression of slow slip prior to the Tohoku earthquake by Kato et al. [2012, Science] and plate structures. Acknowledgement: This study is partly supported by ERI Joint Research (2013-B-05). We used the JMA unified earthquake catalogue and K-NET, KiK-net, and F-net data provided by NIED.

  6. Library preparation for highly accurate population sequencing of RNA viruses

    PubMed Central

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  7. High-throughput DNA sequencing: a genomic data manufacturing process.

    PubMed

    Huang, G M

    1999-01-01

    The progress trends in automated DNA sequencing operation are reviewed. Technological development in sequencing instruments, enzymatic chemistry and robotic stations has resulted in ever-increasing capacity of sequence data production. This progress leads to a higher demand on laboratory information management and data quality assessment. High-throughput laboratories face the challenge of organizational management, as well as technology management. Engineering principles of process control should be adopted in this biological data manufacturing procedure. While various systems attempt to provide solutions to automate different parts of, or even the entire process, new technical advances will continue to change the paradigm and provide new challenges.

  8. Genome reassembly with high-throughput sequencing data

    PubMed Central

    2013-01-01

    Motivation Recent studies in genomics have highlighted the significance of structural variation in determining individual variation. Current methods for identifying structural variation, however, are predominantly focused on either assembling whole genomes from scratch, or identifying the relatively small changes between a genome and a reference sequence. While significant progress has been made in recent years on both de novo assembly and resequencing (read mapping) methods, few attempts have been made to bridge the gap between them. Results In this paper, we present a computational method for incorporating a reference sequence into an assembly algorithm. We propose a novel graph construction that builds upon the well-known de Bruijn graph to incorporate the reference, and describe a simple algorithm, based on iterative message passing, which uses this information to significantly improve assembly results. We validate our method by applying it to a series of 5 Mb simulation genomes derived from both mammalian and bacterial references. The results of applying our method to this simulation data are presented along with a discussion of the benefits and drawbacks of this technique. PMID:23368744

  9. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing

    PubMed Central

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M. G.; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-01-01

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material. PMID:26285800

  10. Multiple Cis-Acting Sequences Contribute to Evolved Regulatory Variation for Drosophila Adh Genes

    PubMed Central

    Fang, X. M.; Brennan, M. D.

    1992-01-01

    Drosophila affinidisjuncta and Drosophila hawaiiensis are closely related species that display distinct tissue-specific expression patterns for their homologous alcohol dehydrogenase genes (Adh genes). In Drosophila melanogaster transformants, both genes are expressed at high levels in the larval and adult fat bodies, but the D. affinidisjuncta gene is expressed 10-50-fold more strongly in the larval and adult midguts and Malpighian tubules. The present study reports the mapping of cis-acting sequences contributing to the regulatory differences between these two genes in transformants. Chimeric genes were constructed and introduced into the germ line of D. melanogaster. Stage- and tissue-specific expression patterns were determined by measuring steady-state RNA levels in larvae and adults. Three portions of the promoter region make distinct contributions to the tissue-specific regulatory differences between the native genes. Sequences immediately upstream of the distal promoter have a strong effect in the adult Malpighian tubules, while sequences between the two promoters are relatively important in the larval Malpighian tubules. A third gene segment, immediately upstream of the proximal promoter, influences levels of the proximal Adh transcript in all tissues and developmental stages examined, and largely accounts for the regulatory difference in the larval and adult midguts. However, these as well as other sequences make smaller contributions to various aspects of the tissue-specific regulatory differences. In addition, some chimeric genes display aberrant RNA levels for the whole organism, suggesting close physical association between sequences involved in tissue-specific regulatory differences and those important for Adh expression in the larval and adult fat bodies. PMID:1644276

  11. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  12. High-Throughput Sequencing of Complete Mitochondrial Genomes.

    PubMed

    Briscoe, Andrew George; Hopkins, Kevin Peter; Waeschenbach, Andrea

    2016-01-01

    Next-generation sequencing has revolutionized mitogenomics, turning a cottage industry into a high throughput process. This chapter outlines methodologies used to sequence, assemble, and annotate mitogenomes of non-model organisms using Illumina sequencing technology, utilizing either long-range PCR amplicons or gDNA as starting template. Instructions are given on how to extract DNA, conduct long-range PCR amplifications, generate short Sanger barcode tag sequences, prepare equimolar sample pools, construct and assess quality library preparations, assemble Illumina reads using either seeded reference mapping or de novo assembly, and annotate mitogenomes in the absence of an automated pipeline. Notes and recommendations, derived from our own experience, are given throughout this chapter. PMID:27460369

  13. Santorini mutation detection meeting 2011: rapid advance in sequencing technology poses challenges for interpretation of genetic variations.

    PubMed

    Stavrou, Eleana F; Goriely, Anne

    2012-10-01

    The 11th International Symposium on Mutations in the Genome was held on 6-10 June, 2011, in Santorini, Greece. Meeting participants described novel detection technologies, rapid advances in whole genome and whole-exome sequencing, but also highlighted the urgent need for the development of sequence variation databases and the clinical interpretation of the genomic data. This report summarizes some of the major themes presented during the meeting.

  14. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA.

    PubMed

    Alquezar-Planas, David E; Fordyce, Sarah L

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred to as "ancient DNA," using the Roche GS FLX 454 platform.

  15. Roche genome sequencer FLX based high-throughput sequencing of ancient DNA.

    PubMed

    Alquezar-Planas, David E; Fordyce, Sarah L

    2012-01-01

    Since the development of so-called "next generation" high-throughput sequencing in 2005, this technology has been applied to a variety of fields. Such applications include disease studies, evolutionary investigations, and ancient DNA. Each application requires a specialized protocol to ensure that the data produced is optimal. Although much of the procedure can be followed directly from the manufacturer's protocols, the key differences lie in the library preparation steps. This chapter presents an optimized protocol for the sequencing of fossil remains and museum specimens, commonly referred to as "ancient DNA," using the Roche GS FLX 454 platform. PMID:22665278

  16. Detecting rare structural variation in evolving microbial populations from new sequence junctions using breseq

    PubMed Central

    Deatherage, Daniel E.; Traverse, Charles C.; Wolf, Lindsey N.; Barrick, Jeffrey E.

    2014-01-01

    New mutations leading to structural variation (SV) in genomes—in the form of mobile element insertions, large deletions, gene duplications, and other chromosomal rearrangements—can play a key role in microbial evolution. Yet, SV is considerably more difficult to predict from short-read genome resequencing data than single-nucleotide substitutions and indels (SN), so it is not yet routinely identified in studies that profile population-level genetic diversity over time in evolution experiments. We implemented an algorithm for detecting polymorphic SV as part of the breseq computational pipeline. This procedure examines split-read alignments, in which the two ends of a single sequencing read match disjoint locations in the reference genome, in order to detect structural variants and estimate their frequencies within a sample. We tested our algorithm using simulated Escherichia coli data and then applied it to 500- and 1000-generation population samples from the Lenski E. coli long-term evolution experiment (LTEE). Knowledge of genes that are targets of selection in the LTEE and mutations present in previously analyzed clonal isolates allowed us to evaluate the accuracy of our procedure. Overall, SV accounted for ~25% of the genetic diversity found in these samples. By profiling rare SV, we were able to identify many cases where alternative mutations in key genes transiently competed within a single population. We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term. Because it is not limited by the base-calling error rate of the sequencing technology, our approach for identifying rare SV in whole-population samples may have a lower detection limit than similar predictions of SNs in these data sets. We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms

  17. Structural variation detection using next-generation sequencing data: A comparative technical review.

    PubMed

    Guan, Peiyong; Sung, Wing-Kin

    2016-06-01

    Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past decade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling techniques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies. PMID:26845461

  18. Natural allelic variations in highly polyploidy Saccharum complex

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sugarcane (Saccharum spp.) as important sugar and biofuel crop are highly polypoid with complex genomes. A large amount of natural phenotypic variation exists in sugarcane germplasm. Understanding its allelic variance has been challenging but is a critical foundation for discovery of the genomic seq...

  19. Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

    PubMed Central

    Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

    2016-01-01

    AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA. PMID:27672588

  20. Comprehensive analysis of genetic variations in strictly-defined Leber congenital amaurosis with whole-exome sequencing in Chinese

    PubMed Central

    Wang, Shi-Yuan; Zhang, Qi; Zhang, Xiang; Zhao, Pei-Quan

    2016-01-01

    AIM To make a comprehensive analysis of the potential pathogenic genes related with Leber congenital amaurosis (LCA) in Chinese. METHODS LCA subjects and their families were retrospectively collected from 2013 to 2015. Firstly, whole-exome sequencing was performed in patients who had underwent gene mutation screening with nothing found, and then homozygous sites was selected, candidate sites were annotated, and pathogenic analysis was conducted using softwares including Sorting Tolerant from Intolerant (SIFT), Polyphen-2, Mutation assessor, Condel, and Functional Analysis through Hidden Markov Models (FATHMM). Furthermore, Gene Ontology function and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of pathogenic genes were performed followed by co-segregation analysis using Fisher exact Test. Sanger sequencing was used to validate single-nucleotide variations (SNVs). Expanded verification was performed in the rest patients. RESULTS Totally 51 LCA families with 53 patients and 24 family members were recruited. A total of 104 SNVs (66 LCA-related genes and 15 co-segregated genes) were submitted for expand verification. The frequencies of homozygous mutation of KRT12 and CYP1A1 were simultaneously observed in 3 families. Enrichment analysis showed that the potential pathogenic genes were mainly enriched in functions related to cell adhesion, biological adhesion, retinoid metabolic process, and eye development biological adhesion. Additionally, WFS1 and STAU2 had the highest homozygous frequencies. CONCLUSION LCA is a highly heterogeneous disease. Mutations in KRT12, CYP1A1, WFS1, and STAU2 may be involved in the development of LCA.

  1. Nucleotide sequence variation of the VP7 gene of two G3-type rotaviruses isolated from dogs.

    PubMed

    Martella, V; Pratelli, A; Greco, G; Gentile, M; Fiorente, P; Tempesta, M; Buonavoglia, C

    2001-04-01

    The sequence of the VP7 gene of two rotaviruses isolated from dogs in southern Italy was determined and the inferred amino acid sequence was compared with that of other rotavirus strains. There was very high nucleotide and amino acid identity between canine strain RV198/95 and other canine strains, and to the human strain HCR3A. Strain RV52/96, however, was found to have about 95% identity to the G3 serotype canine strains K9, A79-10 and CU-1 and 96% identity to strain RV198/95 and to the simian strain RRV. Therefore both of the canine strains belong to the G3 serotype. Nevertheless, detailed analysis of the VP7 variable regions revealed that RV52/96 possesses amino acid substitutions uncommon to the other canine isolates. In addition, strain RV52/96 exhibited a nucleotide divergence greater than 16% from all the other canine strains studied; however, it revealed the closest identity (90.4%) to the simian strain RRV. With only a few exceptions, phylogenetic analysis allowed clear differentiation of the G3 rotaviruses on the basis of the species of origin. The nucleotide and amino acid variations observed in strain RV52/96 could account for the existence of a canine rotavirus G3 sub-type. PMID:11226570

  2. Genetic variation and evolutionary demography of Fenneropenaeus chinensis populations, as revealed by the analysis of mitochondrial control region sequences.

    PubMed

    Kong, Xiao Yu; Li, Yu Long; Shi, Wei; Kong, Jie

    2010-04-01

    Genetic variation and evolutionary demography of the shrimp Fenneropenaeus chinensis were investigated using sequence data of the complete mitochondrial control region (CR). Fragments of 993 bp of the CR were sequenced for 93 individuals from five localities over most of the species' range in the Yellow Sea and the Bohai Sea. There were 84 variable sites defining 68 haplotypes. Haplotype diversity levels were very high (0.95 ± 0.03-0.99 ± 0.02) in F. chinensis populations, whereas those of nucleotide diversity were moderate to low (0.66 ± 0.36%-0.84 ± 0.46%). Analysis of molecular variance and conventional population statistics (F(ST) ) revealed no significant genetic structure throughout the range of F. chinensis. Mismatch distribution, estimates of population parameters and neutrality tests revealed that the significant fluctuations and shallow coalescence of mtDNA genealogies observed were coincident with estimated demographic parameters and neutrality tests, in implying important past-population size fluctuations or range expansion. Isolation with Migration (IM) coalescence results suggest that F. chinensis, distributed along the coasts of northern China and the Korean Peninsula (about 1000 km apart), diverged recently, the estimated time-split being 12,800 (7,400-18,600) years ago.

  3. Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences

    PubMed Central

    Eren, A Murat; Morrison, Hilary G; Lescault, Pamela J; Reveillaud, Julie; Vineis, Joseph H; Sogin, Mitchell L

    2015-01-01

    Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision. PMID:25325381

  4. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    PubMed

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L).

  5. Metadata-driven comparative analysis tool for sequences (meta-CATS): an automated process for identifying significant sequence variations that correlate with virus attributes.

    PubMed

    Pickett, B E; Liu, M; Sadat, E L; Squires, R B; Noronha, J M; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, C N; Bosch, I; Gehrke, L; McGee, M; Klem, E B; Scheuermann, R H

    2013-12-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics.

  6. Metadata-driven Comparative Analysis Tool for Sequences (meta-CATS): an Automated Process for Identifying Significant Sequence Variations that Correlate with Virus Attributes

    PubMed Central

    Pickett, BE; Liu, M; Sadat, EL; Squires, RB; Noronha, JM; He, S; Jen, W; Zaremba, S; Gu, Z; Zhou, L; Larsen, CN; Bosch, I; Gehrke, L; McGee, M; Klem, EB; Scheuermann, RH

    2016-01-01

    The Virus Pathogen Resource (ViPR; www.viprbrc.org) and Influenza Research Database (IRD; www.fludb.org) have developed a metadata-driven Comparative Analysis Tool for Sequences (meta-CATS), which performs statistical comparative analyses of nucleotide and amino acid sequence data to identify correlations between sequence variations and virus attributes (metadata). Meta-CATS guides users through: selecting a set of nucleotide or protein sequences; dividing them into multiple groups based on any associated metadata attribute (e.g. isolation location, host species); performing a statistical test at each aligned position; and identifying all residues that significantly differ between the groups. As proofs of concept, we have used meta-CATS to identify sequence biomarkers associated with dengue viruses isolated from different hemispheres, and to identify variations in the NS1 protein that are unique to each of the 4 dengue serotypes. Meta-CATS is made freely available to virology researchers to identify genotype-phenotype correlations for development of improved vaccines, diagnostics, and therapeutics. PMID:24210098

  7. No variation and low synonymous substitution rates in coral mtDNA despite high nuclear variation

    PubMed Central

    Hellberg, Michael E

    2006-01-01

    Background The mitochondrial DNA (mtDNA) of most animals evolves more rapidly than nuclear DNA, and often shows higher levels of intraspecific polymorphism and population subdivision. The mtDNA of anthozoans (corals, sea fans, and their kin), by contrast, appears to evolve slowly. Slow mtDNA evolution has been reported for several anthozoans, however this slow pace has been difficult to put in phylogenetic context without parallel surveys of nuclear variation or calibrated rates of synonymous substitution that could permit quantitative rate comparisons across taxa. Here, I survey variation in the coding region of a mitochondrial gene from a coral species (Balanophyllia elegans) known to possess high levels of nuclear gene variation, and estimate synonymous rates of mtDNA substitution by comparison to another coral (Tubastrea coccinea). Results The mtDNA surveyed (630 bp of cytochrome oxidase subunit I) was invariant among individuals sampled from 18 populations spanning 3000 km of the range of B. elegans, despite high levels of variation and population subdivision for allozymes over these same populations. The synonymous substitution rate between B. elegans and T. coccinea (0.05%/site/106 years) is similar to that in most plants, but 50–100 times lower than rates typical for most animals. In addition, while substitutions to mtDNA in most animals exhibit a strong bias toward transitions, mtDNA from these corals does not. Conclusion Slow rates of mitochondrial nucleotide substitution result in low levels of intraspecific mtDNA variation in corals, even when nuclear loci vary. Slow mtDNA evolution appears to be the basal condition among eukaryotes. mtDNA substitution rates switch from slow to fast abruptly and unidirectionally. This switch may stem from the loss of just one or a few mitochondrion-specific DNA repair or replication genes. PMID:16542456

  8. Diversity and Variation of Bacterial Community Revealed by MiSeq Sequencing in Chinese Dark Teas

    PubMed Central

    Fu, Jianyu; Lv, Haipeng; Chen, Feng

    2016-01-01

    Chinese dark teas (CDTs) are now among the popular tea beverages worldwide due to their unique health benefits. Because the production of CDTs involves fermentation that is characterized by the effect of microbes, microorganisms are believed to play critical roles in the determination of the chemical characteristics of CDTs. Some dominant fungi have been identified from CDTs. In contrast, little, if anything, is known about the composition of bacterial community in CDTs. This study was set to investigate the diversity and variation of bacterial community in four major types of CDTs from China. First, the composition of the bacterial community of CDTs was determined using MiSeq sequencing. From the four typical CDTs, a total of 238 genera that belong to 128 families of bacteria were detected, including most of the families of beneficial bacteria known to be associated with fermented food. While different types of CDTs had generally distinct bacterial structures, the two types of brick teas produced from adjacent regions displayed strong similarity in bacterial composition, suggesting that the producing environment and processing condition perhaps together influence bacterial succession in CDTs. The global characterization of bacterial communities in CDTs is an essential first step for us to understand their function in fermentation and their potential impact on human health. Such knowledge will be important guidance for improving the production of CDTs with higher quality and elevated health benefits. PMID:27690376

  9. Genetic variation of Sargassum horneri populations detected by inter-simple sequence repeats.

    PubMed

    Ren, J R; Yang, R; He, Y Y; Sun, Q H

    2015-01-30

    The seaweed Sargassum horneri is an important brown alga in the marine environment, and it is an important raw material in the alginate industry. Unfortunately, the fixed resource that was originally reported is now reduced or disappeared, and increased floating populations have been reported in recent years. We sampled a floating population and 4 fixed cultivated populations of S. horneri along the coast of Zhejiang, China. Inter-simple sequence repeat (ISSR) markers were applied in this research to analyze the genetic variation between floating populations and fixed cultivated populations of S. horneri. In total, 220 loci were amplified with 23 ISSR primers. The percentage of polymorphic loci within each population ranged from 53.64 to 95.45%. The highest diversity was observed in population 3, which was the local species that was suspension cultured in the lab and then fixed cultivated in the Nanji Islands before sampling. The lowest diversity was obtained in the floating population 4. The genetic distances among the 5 S. horneri populations ranged from 0.0819 to 0.2889, and the distance tendency confirmed the genetic diversity. The results suggest that the floating population had the lowest genetic diversity and could not be joined into the cluster branch of the fixed cultivated populations.

  10. Interspecific and intrapopulation variation in mitochondrial ribosomal DNA sequences of Mytilus spp. (Bivalvia: Mollusca).

    PubMed

    Geller, J B; Carlton, J T; Powers, D A

    1993-02-01

    A 560-base pair portion of the mitochondrial 16S ribosomal DNA (16S rDNA) from three morphologically similar mussels, Mytilus edulis, M. galloprovincialis, and M. trossulus, was amplified with the polymerase chain reaction, and 349 base pairs were sequenced. These data showed that this gene in M. edulis and M. galloprovincialis has not diverged; however, the north Pacific mussel, M. trossulus, showed fixed differences from M. edulis and M. galloprovincialis at 5 nucleotide positions. Furthermore, the population of M. trossulus at Tillamook Bay, Oregon, was found to contain two very divergent 16S rDNA genotypes that differ at 37 nucleotide positions. Thus, intraspecific variation in this gene in M. trossulus is greater than that seen interspecifically in M. edulis and M. galloprovincialis. Despite this large difference, in the absence of evidence of genetic isolation between these groups of M. trossulus, no taxonomic changes are proposed. These data are consistent with a north Pacific origin of the genus with subsequent dispersal to the Atlantic Ocean across the Artic Sea, giving rise to M. edulis in northern Europe and subsequently M. galloprovincialis in southern Europe and the Mediterranean Sea.

  11. Role of promoter DNA sequence variations on the binding of EGR1 transcription factor.

    PubMed

    Mikles, David C; Schuchardt, Brett J; Bhat, Vikas; McDonald, Caleb B; Farooq, Amjad

    2014-05-01

    In response to a wide variety of stimuli such as growth factors and hormones, EGR1 transcription factor is rapidly induced and immediately exerts downstream effects central to the maintenance of cellular homeostasis. Herein, our biophysical analysis reveals that DNA sequence variations within the target gene promoters tightly modulate the energetics of binding of EGR1 and that nucleotide substitutions at certain positions are much more detrimental to EGR1-DNA interaction than others. Importantly, the reduction in binding affinity poorly correlates with the loss of enthalpy and gain of entropy-a trend indicative of a complex interplay between underlying thermodynamic factors due to the differential role of water solvent upon nucleotide substitution. We also provide a rationale for the physical basis of the effect of nucleotide substitutions on the EGR1-DNA interaction at atomic level. Taken together, our study bears important implications on understanding the molecular determinants of a key protein-DNA interaction at the cross-roads of human health and disease.

  12. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    PubMed Central

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  13. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

    PubMed

    Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

    2015-09-01

    Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.

  14. Sequence and expression variation in SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1): homeolog evolution in Indian Brassicas.

    PubMed

    Sri, Tanu; Mayee, Pratiksha; Singh, Anandita

    2015-09-01

    Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification. PMID:26276216

  15. High-Throughput Sequencing and Rare Genetic Diseases

    PubMed Central

    Makrythanasis, P.; Antonarakis, S.E.

    2012-01-01

    High-throughput sequencing has drastically changed the research of genes responsible for genetic disorders and is now gradually introduced as an additional genetic diagnostic testing in clinical practice. The current debates on the emerging technical, medical and ethical issues as well as the potential optimum use of the available technology are discussed. PMID:23293577

  16. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10‑2 M ⊙ yr‑1 for solar type stars, and up to ≈ 1 M ⊙ yr‑1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  17. Binary interactions with high accretion rates onto main sequence stars

    NASA Astrophysics Data System (ADS)

    Shiber, Sagiv; Schreier, Ron; Soker, Noam

    2016-07-01

    Energetic outflows from main sequence stars accreting mass at very high rates might account for the powering of some eruptive objects, such as merging main sequence stars, major eruptions of luminous blue variables, e.g., the Great Eruption of Eta Carinae, and other intermediate luminosity optical transients (ILOTs; red novae; red transients). These powerful outflows could potentially also supply the extra energy required in the common envelope process and in the grazing envelope evolution of binary systems. We propose that a massive outflow/jets mediated by magnetic fields might remove energy and angular momentum from the accretion disk to allow such high accretion rate flows. By examining the possible activity of the magnetic fields of accretion disks, we conclude that indeed main sequence stars might accrete mass at very high rates, up to ≈ 10-2 M ⊙ yr-1 for solar type stars, and up to ≈ 1 M ⊙ yr-1 for very massive stars. We speculate that magnetic fields amplified in such extreme conditions might lead to the formation of massive bipolar outflows that can remove most of the disk's energy and angular momentum. It is this energy and angular momentum removal that allows the very high mass accretion rate onto main sequence stars.

  18. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay [Monsanto

    2016-07-12

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  19. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Athavale, Ajay

    2012-06-01

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  20. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed

    Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  1. Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.

    PubMed Central

    Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.

    2015-01-01

    In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179

  2. Optical transitions in highly charged californium ions with high sensitivity to variation of the fine-structure constant.

    PubMed

    Berengut, J C; Dzuba, V A; Flambaum, V V; Ong, A

    2012-08-17

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf(16+) is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf(16+) has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  3. Optical Transitions in Highly Charged Californium Ions with High Sensitivity to Variation of the Fine-Structure Constant

    NASA Astrophysics Data System (ADS)

    Berengut, J. C.; Dzuba, V. A.; Flambaum, V. V.; Ong, A.

    2012-08-01

    We study electronic transitions in highly charged Cf ions that are within the frequency range of optical lasers and have very high sensitivity to potential variations in the fine-structure constant, α. The transitions are in the optical range despite the large ionization energies because they lie on the level crossing of the 5f and 6p valence orbitals in the thallium isoelectronic sequence. Cf16+ is a particularly rich ion, having several narrow lines with properties that minimize certain systematic effects. Cf16+ has very large nuclear charge and large ionization energy, resulting in the largest α sensitivity seen in atomic systems. The lines include positive and negative shifters.

  4. Microsatellites and 16S sequences corroborate phenotypic evidence of trans-Andean variation in the parasitoid Microctonus hyperodae (Hymenoptera: Braconidae).

    PubMed

    Winder, L M; Phillips, C B; Lenney-Williams, C; Cane, R P; Paterson, K; Vink, C J; Goldson, S L

    2005-08-01

    Eight South American geographical populations of the parasitoid Microctonus hyperodae Loan were collected in South America (Argentina, Brazil, Chile and Uruguay) and released in New Zealand for biological control of the weevil Listronotus bonariensis (Kuschel), a pest of pasture grasses and cereals. DNA sequencing (16S, COI, 28S, ITS1, beta-tubulin), RAPD, AFLP, microsatellite, SSCP and RFLP analyses were used to seek markers for discriminating between the South American populations. All of the South American populations were more homogeneous than expected. However, variation in microsatellites and 16S gene sequences corroborated morphological, allozyme and other phenotypic evidence of trans-Andes variation between the populations. The Chilean populations were the most genetically variable, while the variation present on the eastern side of the Andes mountains was a subset of that observed in Chile.

  5. Salmonella serotype determination utilizing high-throughput genome sequencing data.

    PubMed

    Zhang, Shaokang; Yin, Yanlong; Jones, Marcus B; Zhang, Zhenzhen; Deatherage Kaiser, Brooke L; Dinsmore, Blake A; Fitzgerald, Collette; Fields, Patricia I; Deng, Xiangyu

    2015-05-01

    Serotyping forms the basis of national and international surveillance networks for Salmonella, one of the most prevalent foodborne pathogens worldwide (1-3). Public health microbiology is currently being transformed by whole-genome sequencing (WGS), which opens the door to serotype determination using WGS data. SeqSero (www.denglab.info/SeqSero) is a novel Web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies. The performance of SeqSero was evaluated by testing (i) raw reads from genomes of 308 Salmonella isolates of known serotype; (ii) raw reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration; and (iii) 354 other publicly available draft or complete Salmonella genomes. We also demonstrated Salmonella serotype determination from raw sequencing reads of fecal metagenomes from mice orally infected with this pathogen. SeqSero can help to maintain the well-established utility of Salmonella serotyping when integrated into a platform of WGS-based pathogen subtyping and characterization.

  6. Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R)

    PubMed Central

    2013-01-01

    Background Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. PMID:23915680

  7. Antibody responses to hepatitis C virus hypervariable region 1: evidence for cross-reactivity and immune-mediated sequence variation.

    PubMed

    Mondelli, M U; Cerino, A; Lisa, A; Brambilla, S; Segagni, L; Cividini, A; Bissolati, M; Missale, G; Bellati, G; Meola, A; Bruniercole, B; Nicosia, A; Galfrè, G; Silini, E

    1999-08-01

    Sequence heterogeneity of hepatitis C virus (HCV) is unevenly distributed along the genome, and maximal variation is confined to a short sequence of the HCV second envelope glycoprotein (E2), designated hypervariable region 1 (HVR1), whose biological function is still undefined. We prospectively studied serological responses to synthetic oligopeptides derived from HVR1 sequences of patients with acute and chronic HCV infection obtained at baseline and after a defined follow-up period. Extensive serological cross-reactivity for unrelated HVR1 peptides was observed in the majority of the patients. Antibody response was restricted to the IgG1 isotype and was focused on the carboxyterminal end of the HVR1 region. Cross-reactive antibodies could be readily elicited following immunization of mice with multiple antigenic peptides carrying HVR1 sequences derived from our patients. The vigor and heterogeneity of cross-reactive antibody responses were significantly higher in patients with chronic hepatitis compared with those with acute hepatitis and in patients infected with HCV type 2 compared with patients infected with other viral genotypes (predominantly type 1), which suggest that higher time-related HVR1 sequence diversification previously described for type 2 may result from immune selection. The finding of a statistically significant correlation between HVR1 sequence variation, and intensity, and cross-reactivity of humoral immune responses provided stronger evidence in support of the contention that HCV variant selection is driven by the host's immune pressure.

  8. Modelling Human Regulatory Variation in Mouse: Finding the Function in Genome-Wide Association Studies and Whole-Genome Sequencing

    PubMed Central

    Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.

    2012-01-01

    An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661

  9. Sequence Polymorphisms at the REDUCED DORMANCY5 Pseudophosphatase Underlie Natural Variation in Arabidopsis Dormancy1[OPEN

    PubMed Central

    Xiang, Yong; Song, Baoxing; Née, Guillaume; Kramer, Katharina; Soppe, Wim J.J.

    2016-01-01

    Seed dormancy controls the timing of germination, which regulates the adaptation of plants to their environment and influences agricultural production. The time of germination is under strong natural selection and shows variation within species due to local adaptation. The identification of genes underlying dormancy quantitative trait loci is a major scientific challenge, which is relevant for agricultural and ecological goals. In this study, we describe the identification of the DELAY OF GERMINATION18 (DOG18) quantitative trait locus, which was identified as a factor in natural variation for seed dormancy in Arabidopsis (Arabidopsis thaliana). DOG18 encodes a member of the clade A of the type 2C protein phosphatases family, which we previously identified as the REDUCED DORMANCY5 (RDO5) gene. DOG18/RDO5 shows a relatively high frequency of loss-of-function alleles in natural accessions restricted to northwestern Europe. The loss of dormancy in these loss-of-function alleles can be compensated for by genetic factors like DOG1 and DOG6, and by environmental factors such as low temperature. RDO5 does not have detectable phosphatase activity. Analysis of the phosphoproteome in dry and imbibed seeds revealed a general decrease in protein phosphorylation during seed imbibition that is enhanced in the rdo5 mutant. We conclude that RDO5 acts as a pseudophosphatase that inhibits dephosphorylation during seed imbibition. PMID:27288362

  10. Identification of conserved genomic regions and variation therein amongst Cetartiodactyla species using next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background Next Generation Sequencing has created an opportunity to genetically characterize an individual both inexpensively and comprehensively. In earlier work produced in our collaboration [1], it was demonstrated that, for animals without a reference genome, their Next Generation Sequence data ...

  11. High-Frequency Variation of Purine Biosynthesis Genes Is a Mechanism of Success in Campylobacter jejuni

    PubMed Central

    Cameron, Andrew; Huynh, Steven; Scott, Nichollas E.; Frirdich, Emilisa; Apel, Dmitry; Foster, Leonard J.; Parker, Craig T.

    2015-01-01

    ABSTRACT Phenotypic variation is prevalent in the zoonotic pathogen Campylobacter jejuni, the leading agent of enterocolitis in the developed world. Heterogeneity enhances the survival and adaptive malleability of bacterial populations because variable phenotypes may allow some cells to be protected against future stress. Exposure to hyperosmotic stress previously revealed prevalent differences in growth between C. jejuni strain 81-176 colonies due to resistant or sensitive phenotypes, and these isolated colonies continued to produce progeny with differential phenotypes. In this study, whole-genome sequencing of isolated colonies identified allelic variants of two purine biosynthesis genes, purF and apt, encoding phosphoribosyltransferases that utilize a shared substrate. Genetic analyses determined that purF was essential for fitness, while apt was critical. Traditional and high-depth amplicon-sequencing analyses confirmed extensive intrapopulation genetic variation of purF and apt that resulted in viable strains bearing alleles with in-frame insertion duplications, deletions, or missense polymorphisms. Different purF and apt alleles were associated with various stress survival capabilities under several niche-relevant conditions and contributed to differential intracellular survival in an epithelial cell infection model. Amplicon sequencing revealed that intracellular survival selected for stress-fit purF and apt alleles, as did exposure to oxygen and hyperosmotic stress. Putative protein recognition direct repeat sequences were identified in purF and apt, and a DNA-protein affinity screen captured a predicted exonuclease that promoted the global spontaneous mutation rate. This work illustrates the adaptive properties of high-frequency genetic variation in two housekeeping genes, which influences C. jejuni survival under stress and promotes its success as a pathogen. PMID:26419875

  12. Sequence Variation in Superoxide Dismutase Gene of Toxoplasma gondii among Various Isolates from Different Hosts and Geographical Regions.

    PubMed

    Wang, Shuai; Cao, Aiping; Li, Xun; Zhao, Qunli; Liu, Yuan; Cong, Hua; He, Shenyi; Zhou, Huaiyu

    2015-06-01

    Toxoplasma gondii, an obligate intracellular protozoan parasite of the phylum Apicomplexa, can infect all warm-blooded vertebrates, including humans, livestock, and marine mammals. The aim of this study was to investigate whether superoxide dismutase (SOD) of T. gondii can be used as a new marker for genetic study or a potential vaccine candidate. The partial genome region of the SOD gene was amplified and sequenced from 10 different T. gondii isolates from different parts of the world, and all the sequences were examined by PCR-RFLP, sequence analysis, and phylogenetic reconstruction. The results showed that partial SOD gene sequences ranged from 1,702 bp to 1,712 bp and A + T contents varied from 50.1% to 51.1% among all examined isolates. Sequence alignment analysis identified total 43 variable nucleotide positions, and these results showed that 97.5% sequence similarity of SOD gene among all examined isolates. Phylogenetic analysis revealed that these SOD sequences were not an effective molecular marker for differential identification of T. gondii strains. The research demonstrated existence of low sequence variation in the SOD gene among T. gondii strains of different genotypes from different hosts and geographical regions. PMID:26174817

  13. Fin whale MDH-1 and MPI allozyme variation is not reflected in the corresponding DNA sequences

    PubMed Central

    Olsen, Morten Tange; Pampoulie, Christophe; Daníelsdóttir, Anna K; Lidh, Emmelie; Bérubé, Martine; Víkingsson, Gísli A; Palsbøll, Per J

    2014-01-01

    The appeal of genetic inference methods to assess population genetic structure and guide management efforts is grounded in the correlation between the genetic similarity and gene flow among populations. Effects of such gene flow are typically genomewide; however, some loci may appear as outliers, displaying above or below average genetic divergence relative to the genomewide level. Above average population, genetic divergence may be due to divergent selection as a result of local adaptation. Consequently, substantial efforts have been directed toward such outlying loci in order to identify traits subject to local adaptation. Here, we report the results of an investigation into the molecular basis of the substantial degree of genetic divergence previously reported at allozyme loci among North Atlantic fin whale (Balaenoptera physalus) populations. We sequenced the exons encoding for the two most divergent allozyme loci (MDH-1 and MPI) and failed to detect any nonsynonymous substitutions. Following extensive error checking and analysis of additional bioinformatic and morphological data, we hypothesize that the observed allozyme polymorphisms may reflect phenotypic plasticity at the cellular level, perhaps as a response to nutritional stress. While such plasticity is intriguing in itself, and of fundamental evolutionary interest, our key finding is that the observed allozyme variation does not appear to be a result of genetic drift, migration, or selection on the MDH-1 and MPI exons themselves, stressing the importance of interpreting allozyme data with caution. As for North Atlantic fin whale population structure, our findings support the low levels of differentiation found in previous analyses of DNA nucleotide loci. PMID:24963377

  14. Heteroplasmy, length and sequence variation in the mtDNA control regions of three percid fish species (Perca fluviatilis, Acerina cernua, Stizostedion lucioperca).

    PubMed Central

    Nesbø, C L; Arab, M O; Jakobsen, K S

    1998-01-01

    The nucleotide sequence of the control region and flanking tRNA genes of perch (Perca fluviatilis) mtDNA was determined. The organization of this region is similar to that of other vertebrates. A tandem array of 10-bp repeats, associated with length variation and heteroplasmy was observed in the 5' end. While the location of the array corresponds to that reported in other species, the length of the repeated unit is shorter than previously observed for tandem repeats in this region. The repeated sequence was highly similar to the Mt5 element which has been shown to specifically bind a putative D-loop DNA termination protein. Of 149 perch analyzed, 74% showed length variation heteroplasmy. Single-cell PCR on oocytes suggested that the high level of heteroplasmy is passively maintained by maternal transmission. The array was also observed in the two other percid species, ruffe (Acerina cernua) and zander (Stizostedion lucioperca). The array and the associated length variation heteroplasmy are therefore likely to be general features of percid mtDNAs. Among the perch repeats, the mutation pattern is consistent with unidirectional slippage, and statistical analyses supported the notion that the various haplotypes are associated with different levels of heteroplasmy. The variation in array length among and within species is ascribed to differences in predicted stability of secondary structures made between repeat units. PMID:9560404

  15. Climate changes associated with high-amplitude Sq geomagnetic variations

    NASA Astrophysics Data System (ADS)

    Rabeh, Taha; Carvalho, Joao; Khalil, Ahmed; El-Aal, Esmat; El-Hemaly, Ibrahim

    2011-10-01

    When the solar irradiance propagates between the outer magnetospheric regions and the ionosphere, dynamic processes of the magnetosphere-ionosphere-thermosphere system are affected at the lower end of their paths by the interaction of radiation with the neutral troposphere. The main target of this work is to investigate the relationship between the diurnal magnetic field variations resulting from solar activities and the variation in the troposphere temperature. Meteorological and geomagnetic data acquired from different observatories located in Egypt, Portugal and Slovakia in a long-term and daily-term scales were analyzed. The long-term results show that there is a close relationship between the diurnal Sq magnetic field variations and the tropospheric temperature. The rate of temperature increase at mid-latitude areas is higher than at high-latitude. During the period of investigation, it is found that the troposphere temperature has increased by about 0.033 °C/year at Helwan, Egypt, 0.03 °C/year at Coimbra, Portugal, and 0.028 °C/year in Hurbanovo/Stará Lesná, Slovakia. The Sq geomagnetic variations depend on the intensity of the electric currents generated by the effect of solar radiation in the ionosphere.

  16. Climate Changes Associated with High Amplitude Sq Geomagnetic Variations

    NASA Astrophysics Data System (ADS)

    Rabeh, Taha; Khalil, Ahmed; Abdel All, Esmat

    2010-05-01

    The Earth's climate has always been changing since the ancient geologic Epochs. When the solar irradiance propagates between the outer magnetospheric regions and the ionosphere, mediate dynamic processes of the magnetosphere-ionosphere-thermosphere system are affected at the lower end of their paths by the interaction of the radiations with the neutral atmosphere. The ionosphere-thermosphere interactions play an important role for explaining the relationship between the magnetic field and the changes in the atmospheric temperature. The main target of this work is to investigate the relationship between the diurnal magnetic field variations resulted from solar activities and the variation in the Earth's temperature. The meteorological and geomagnetic data acquired from different observatories around the globe were analyzed. Three different locations in Egypt, Portugal and Slovakia for long and daily terms were presented. The results show that for long periods, there is a close relationship between the diurnal variations, Sq magnetic field and the atmospheric temperature. The increasing rate of the temperature at mid-latitude areas is higher than at high-latitude areas. During the period of investigation, it is found that the temperature increases at Helwan, Egypt by about 0.033 °C/year, 0.03 °C/year at Coimbra, Portugal and 0.028 °C/year in Hurbanovo/Stará Lesn, Slovakia. The Sq geomagnetic variations depend on the intensity of the electric currents generated by the effect of solar radiations in the Ionosphere.

  17. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    PubMed

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L). PMID:27617098

  18. Genetic variation analysis of Mugil cephalus in China sea based on mitochondrial COI gene sequences.

    PubMed

    Sun, Peng; Shi, Zhao-hong; Yin, Fei; Peng, Shi-ming

    2012-04-01

    In this study, genetic diversity and population genetic structure of flathead grey mullet, Mugil cephalus, among four China Sea populations were investigated by COI sequences. All the populations studied had high values of haplotype and nucleotide diversity, except for the Yellow Sea population. In the phylogenetic tree, these haplotypes clustered in two groups, one for the populations from the Bohai and East China seas, and the other from the Yellow and South China seas. Analysis of molecular variance indicated that the northern populations (Bohai and East China) had lower genetic divergence (0.0725, P > 0.05) than that of the southern population (South China) (0.4530-0.6827, P < 0.001), suggesting that two distinct genetic groups exist in Chinese waters. Tests of neutral evolution and mismatch distribution indicated that no historical demographic expansion occurred in these populations. The results provide new information for genetic assessment, fishery management, and conservation of this species.

  19. Identification of Genes Responsible for Natural Variation in Volatile Content Using Next-Generation Sequencing Technology.

    PubMed

    Amaya, Iraida; Pillet, Jeremy; Folta, Kevin M

    2016-01-01

    Identification of the genes controlling the variation of key traits remains a challenge for plant researchers and represents a goal for the development of functional markers and their implementation in marker-assisted crop breeding. As an example we describe the identification of volatile organic compounds (VOCs) that segregate as single locus or mayor quantitative trait loci (QTL) in strawberry F1 segregating populations. Next, we describe a fast and efficient method for RNA extraction in strawberry that yields high-quality RNA for downstream RNA-seq analysis. Finally, two alternative methods for analysis of global transcript expression in contrasting lines will be described in order to identify the candidate gene and genes with differential expression using RNA-seq.

  20. High-Resolution Specificity from DNA Sequencing Highlights Alternative Modes of Lac Repressor Binding

    PubMed Central

    Zuo, Zheng; Stormo, Gary D.

    2014-01-01

    Knowing the specificity of transcription factors is critical to understanding regulatory networks in cells. The lac repressor–operator system has been studied for many years, but not with high-throughput methods capable of determining specificity comprehensively. Details of its binding interaction and its selection of an asymmetric binding site have been controversial. We employed a new method to accurately determine relative binding affinities to thousands of sequences simultaneously, requiring only sequencing of bound and unbound fractions. An analysis of 2560 different DNA sequence variants, including both base changes and variations in operator length, provides a detailed view of lac repressor sequence specificity. We find that the protein can bind with nearly equal affinities to operators of three different lengths, but the sequence preference changes depending on the length, demonstrating alternative modes of interaction between the protein and DNA. The wild-type operator has an odd length, causing the two monomers to bind in alternative modes, making the asymmetric operator the preferred binding site. We tested two other members of the LacI/GalR protein family and find that neither can bind with high affinity to sites with alternative lengths or shows evidence of alternative binding modes. A further comparison with known and predicted motifs suggests that the lac repressor may be unique in this ability and that this may contribute to its selection. PMID:25209146

  1. A High Resolution Genetic Map Anchoring Scaffolds of the Sequenced Watermelon Genome

    PubMed Central

    Kou, Qinghe; Jiang, Jiao; Guo, Shaogui; Zhang, Haiying; Hou, Wenju; Zou, Xiaohua; Sun, Honghe; Gong, Guoyi; Levi, Amnon; Xu, Yong

    2012-01-01

    As part of our ongoing efforts to sequence and map the watermelon (Citrullus spp.) genome, we have constructed a high density genetic linkage map. The map positioned 234 watermelon genome sequence scaffolds (an average size of 1.41 Mb) that cover about 330 Mb and account for 93.5% of the 353 Mb of the assembled genomic sequences of the elite Chinese watermelon line 97103 (Citrullus lanatus var. lanatus). The genetic map was constructed using an F8 population of 103 recombinant inbred lines (RILs). The RILs are derived from a cross between the line 97103 and the United States Plant Introduction (PI) 296341-FR (C. lanatus var. citroides) that contains resistance to fusarium wilt (races 0, 1, and 2). The genetic map consists of eleven linkage groups that include 698 simple sequence repeat (SSR), 219 insertion-deletion (InDel) and 36 structure variation (SV) markers and spans ∼800 cM with a mean marker interval of 0.8 cM. Using fluorescent in situ hybridization (FISH) with 11 BACs that produced chromosome-specifc signals, we have depicted watermelon chromosomes that correspond to the eleven linkage groups constructed in this study. The high resolution genetic map developed here should be a useful platform for the assembly of the watermelon genome, for the development of sequence-based markers used in breeding programs, and for the identification of genes associated with important agricultural traits. PMID:22247776

  2. High mitochondrial sequence diversity in linguistic isolates of the Alps.

    PubMed Central

    Stenico, M.; Nigro, L.; Bertorelle, G.; Calafell, F.; Capitanio, M.; Corrain, C.; Barbujani, G.

    1996-01-01

    Segment I of the control region of mtDNA (360 bases) was sequenced in seven samples, each of 10 individuals inhabiting villages in the eastern Italian Alps (South Tyrol and Trentino). Three linguistic groups, German, Italian, and Ladin, were represented by two samples each; the seventh sample comes from an isolated group of German origin, the Mocheni, who are linguistically distinct and geographically separated from the bulk of the German speakers. Seventy-four polymorphic sites were identified, defining 63 different haplotypes. Mocheni and Ladin speakers tend to form two clusters in the evolutionary trees inferred from sequences. Analysis of molecular variance shows significant differentiation within samples, among them, and among linguistic groups. Genetic differences between the Ladins and the other groups are not much smaller than between Europeans and some Africans; variation is large within groups, as well, with the exception of only the Mocheni. In the evolutionary trees where the four alpine groups are compared with other European populations, Mocheni and especially Ladins appear as clear outliers. Romansch-speaking Swiss, who are linguistically related to Ladins, are not genetically similar to them, for this segment of DNA. Because the time elapsed since colonization of the Alps (< or = 12,000 years) is short in mutational terms, the only model accounting for the observed relationships between mtDNA variation and linguistic identity seems one in which a population ancestral to Ladin speakers was already differentiated long before the Alps were settled and the current linguistic affiliations were established. For the Mocheni, the results are consistent with a simpler episode of allele loss, from an original genetic pool common to the ancestors of the current German speakers. PMID:8940282

  3. Transcriptome sequencing of a highly salt tolerant mangrove species Sonneratia alba using Illumina platform.

    PubMed

    Chen, Sufang; Zhou, Renchao; Huang, Yelin; Zhang, Meng; Yang, Guili; Zhong, Cairong; Shi, Suhua

    2011-06-01

    Mangroves are critical and threatened marine resources, yet few transcriptomic and genomic data are available in public databases. The transcriptome of a highly salt tolerant mangrove species, Sonneratia alba, was sequenced using the Illumina Genome Analyzer in this study. Over 15 million 75-bp paired-end reads were assembled into 30,628 unique sequences with an average length of 581 bp. Of them, 2358 SSRs were detected, with di-nucleotide repeats (59.2%) and tri-nucleotide repeats (37.7%) being the most common. Analysis of codon usage bias based on 20,945 coding sequences indicated that genes of S. alba were less biased than those of some microorganisms and Drosophila and that codon usage variation in S. alba was due primarily to compositional mutation bias, while translational selection has a relatively weak effect. Genome-wide gene ontology (GO) assignments showed that S. alba shared a similar GO slim classification with Arabidopsis thaliana. High percentages of sequences assigned to GO slim category 'mitochondrion' and four KEGG pathways, such as carbohydrates and secondary metabolites metabolism, may contribute to salt adaptation of S. alba. In addition, 1266 unique sequences matched to 273 known salt responsive genes (gene families) in other species were screened as candidates for salt tolerance of S. alba, and some of these genes showed fairly high coverage depth. At last, we identified four genes with signals of strong diversifying selection (K(a)/K(s)>1) by comparing the transcriptome sequences of S. alba with 249 known ESTs from its congener S. caseolaris. This study demonstrated a successful application of the Illumina platform to de novo assembly of the transcriptome of a non-model organism. Abundant SSR markers, salt responsive genes and four genes with signature of natural selection obtained from S. alba provide abundant sequence sources for future genetic diversity, salt adaptation and speciation studies. PMID:21620334

  4. Mitochondrial DNA variation and phylogenetic relationships among five tuna species based on sequencing of D-loop region.

    PubMed

    Kumar, Girish; Kocour, Martin; Kunal, Swaraj Priyaranjan

    2016-05-01

    In order to assess the DNA sequence variation and phylogenetic relationship among five tuna species (Auxis thazard, Euthynnus affinis, Katsuwonus pelamis, Thunnus tonggol, and T. albacares) out of all four tuna genera, partial sequences of the mitochondrial DNA (mtDNA) D-loop region were analyzed. The estimate of intra-specific sequence variation in studied species was low, ranging from 0.027 to 0.080 [Kimura's two parameter distance (K2P)], whereas values of inter-specific variation ranged from 0.049 to 0.491. The longtail tuna (T. tonggol) and yellowfin tuna (T. albacares) were found to share a close relationship (K2P = 0.049) while skipjack tuna (K. pelamis) was most divergent studied species. Phylogenetic analysis using Maximum-Likelihood (ML) and Neighbor-Joining (NJ) methods supported the monophyletic origin of Thunnus species. Similarly, phylogeny of Auxis and Euthynnus species substantiate the monophyly. However, results showed a distinct origin of K. pelamis from genus Thunnus as well as Auxis and Euthynnus. Thus, the mtDNA D-loop region sequence data supports the polyphyletic origin of tuna species. PMID:25329285

  5. Ultra-deep Illumina sequencing accurately identifies MHC class IIb alleles and provides evidence for copy number variation in the guppy (Poecilia reticulata).

    PubMed

    Lighten, Jackie; van Oosterhout, Cock; Paterson, Ian G; McMullan, Mark; Bentzen, Paul

    2014-07-01

    We address the bioinformatic issue of accurately separating amplified genes of the major histocompatibility complex (MHC) from artefacts generated during high-throughput sequencing workflows. We fit observed ultra-deep sequencing depths (hundreds to thousands of sequences per amplicon) of allelic variants to expectations from genetic models of copy number variation (CNV). We provide a simple, accurate and repeatable method for genotyping multigene families, evaluating our method via analyses of 209 b of MHC class IIb exon 2 in guppies (Poecilia reticulata). Genotype repeatability for resequenced individuals (N = 49) was high (100%) within the same sequencing run. However, repeatability dropped to 83.7% between independent runs, either because of lower mean amplicon sequencing depth in the initial run or random PCR effects. This highlights the importance of fully independent replicates. Significant improvements in genotyping accuracy were made by greatly reducing type I genotyping error (i.e. accepting an artefact as a true allele), which may occur when using low-depth allele validation thresholds used by previous methods. Only a small amount (4.9%) of type II error (i.e. rejecting a genuine allele as an artefact) was detected through fully independent sequencing runs. We observed 1-6 alleles per individual, and evidence of sharing of alleles across loci. Variation in the total number of MHC class II loci among individuals, both among and within populations was also observed, and some genotypes appeared to be partially hemizygous; total allelic dosage added up to an odd number of allelic copies. Collectively, observations provide evidence of MHC CNV and its complex basis in natural populations.

  6. The tryptophan repressor sequence is highly conserved among the Enterobacteriaceae.

    PubMed Central

    Arvidson, D N; Arvidson, C G; Lawson, C L; Miner, J; Adams, C; Youderian, P

    1994-01-01

    Tryptophan biosynthesis in Escherichia coli is regulated by the product of the trpR gene, the tryptophan (Trp) repressor. Trp aporepressor binds the corepressor, L-tryptophan, to form a holorepressor complex, which binds trp operator DNA tightly, and inhibits transcription of the tryptophan biosynthetic operon. The conservation of trp operator sequences among enteric Gram-negative bacteria suggests that trpR genes from other bacterial species can be cloned by complementation in E. coli. To clone trpR homologues, a deletion of the E. coli trpR gene, delta trpR504, was made on a plasmid by site-directed mutagenesis, then crossed onto the E. coli genome. Plasmid clones of the trpR genes of Enterobacter aerogenes and Enterobacter cloacae were isolated by complementation of the delta trpR504 allele, scored as the ability to repress beta-galactosidase synthesis from a prophage-borne trpE-lacZ gene fusion. The predicted amino acid sequences of four enteric TrpR proteins show differences, clustered on the backside of the folded repressor, opposite the DNA-binding helix-turn-helix substructures. These differences are predicted to have little effect on the interactions of the aporepressor with tryptophan, holorepressor with operator DNA, or tandemly bound holorepressor dimers with one another. Although there is some variation observed at the dimer interface, interactions predicted to stabilize the interface are conserved. The phylogenetic relationships revealed by the TrpR amino acid sequence alignment agree with the results of others. PMID:8208606

  7. Phylogenetic Relationships and Genetic Variation in Longidorus and Xiphinema Species (Nematoda: Longidoridae) Using ITS1 Sequences of Nuclear Ribosomal DNA.

    PubMed

    Ye, Weimin; Szalanski, Allen L; Robbins, R T

    2004-03-01

    Genetic analyses using DNA sequences of nuclear ribosomal DNA ITS1 were conducted to determine the extent of genetic variation within and among Longidorus and Xiphinema species. DNA sequences were obtained from samples collected from Arkansas, California and Australia as well as 4 Xiphinema DNA sequences from GenBank. The sequences of the ITS1 region including the 3' end of the 18S rDNA gene and the 5' end of the 5.8S rDNA gene ranged from 1020 bp to 1244 bp for the 9 Longidorus species, and from 870 bp to 1354 bp for the 7 Xiphinema species. Nucleotide frequencies were: A = 25.5%, C = 21.0%, G = 26.4%, and T = 27.1%. Genetic variation between the two genera had a maximum divergence of 38.6% between X. chambersi and L. crassus. Genetic variation among Xiphinema species ranged from 3.8% between X. diversicaudatum and X. bakeri to 29.9% between X. chambersi and X. italiae. Within Longidorus, genetic variation ranged from 8.9% between L. crassus and L. grandis to 32.4% between L. fragilis and L. diadecturus. Intraspecific genetic variation in X. americanum sensu lato ranged from 0.3% to 1.9%, while genetic variation in L. diadecturus had 0.8% and L. biformis ranged from 0.6% to 10.9%. Identical sequences were obtained between the two populations of L. grandis, and between the two populations of X. bakeri. Phylogenetic analyses based on the ITS1 DNA sequence data were conducted on each genus separately using both maximum parsimony and maximum likelihood analysis. Among the Longidorus taxa, 4 subgroups are supported: L. grandis, L. crassus, and L. elongatus are in one cluster; L. biformis and L. paralongicaudatus are in a second cluster; L. fragilis and L. breviannulatus are in a third cluster; and L. diadecturus is in a fourth cluster. Among the Xiphinema taxa, 3 subgroups are supported: X. americanum with X. chambersi, X. bakeri with X. diversicaudatum, and X. italiae and X. vuittenezi forming a sister group with X. index. The relationships observed in this study

  8. Characterization and Sequence Variation in the rDNA Region of Six Nematode Species of the Genus Longidorus (Nematoda)

    PubMed Central

    De Luca, F.; Reyes, A.; Grunder, J.; Kunz, P.; Agostinelli, A.; De Giorgi, C.; Lamberti, F.

    2004-01-01

    Total DNA was isolated from individual nematodes of the species Longidorus helveticus, L. macrosoma, L. arthensis, L. profundorum, L. elongatus, and L. raskii collected in Switzerland. The ITS region and D1-D2 expansion segments of the 26S rDNA were amplified and cloned. The sequences obtained were aligned in order to investigate sequence diversity and to infer the phylogenetic relationships among the six Longidorus species. D1-D2 sequences were more conserved than the ITS sequences that varied widely in primary structure and length, and no consensus was observed. Phylogenetic analyses using the neighbor-joining, maximum parsimony and maximum likelihood methods were performed with three different sequence data sets: ITS1-ITS2, 5.8S-D1-D2, and combining ITS1-ITS2+5.8S-D1-D2 sequences. All multiple alignments yielded similar basic trees supporting the existence of the six species established using morphological characters. These sequence data also provided evidence that the different regions of the rDNA are characterized by different evolution rates and by different factors associated with the generation of extreme size variation. PMID:19262800

  9. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    PubMed

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. PMID:25293589

  10. High-Throughput Sequencing of a South American Amerindian

    PubMed Central

    Almeida, Renan; Alencar, Dayse O.; Barbosa, Maria Silvanira; Gusmão, Leonor; Silva, Wilson A.; de Souza, Sandro J.; Silva, Artur; Ribeiro-dos-Santos, Ândrea; Darnet, Sylvain; Santos, Sidney

    2013-01-01

    The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history. PMID:24386182

  11. Variations in T Cell Transcription Factor Sequence and Expression Associated with Resistance to the Sheep Nematode Teladorsagia circumcincta.

    PubMed

    Wilkie, Hazel; Gossner, Anton; Bishop, Stephen; Hopkins, John

    2016-01-01

    This study used selected lambs that varied in their resistance to the gastrointestinal parasite Teladorsagia circumcincta. Infection over 12 weeks identified susceptible (high adult worm count, AWC; high fecal egg count, FEC; low body weight, BW; low IgA) and resistant sheep (no/low AWC and FEC, high BW and high IgA). Resistance is mediated largely by a Th2 response and IgA and IgE antibodies, and is a heritable characteristic. The polarization of T cells and the development of appropriate immune responses is controlled by the master regulators, T-bet (TBX21), GATA-3 (GATA3), RORγt (RORC2) and RORα (RORA); and several inflammatory diseases of humans and mice are associated with allelic or transcript variants of these transcription factors. This study tested the hypothesis that resistance of sheep to T. circumcincta is associated with variations in the structure, sequence or expression levels of individual master regulator transcripts. We have identified and sequenced one variant of sheep TBX21, two variants of GATA3 and RORC2 and five variants of RORA from lymph node mRNA. Relative RT-qPCR analysis showed that TBX21, GATA3 and RORC2 were not significantly differentially-expressed between the nine most resistant (AWC, 0; FEC, 0) and the nine most susceptible sheep (AWC, mean 6078; FEC, mean 350). Absolute RT-qPCR on 29 all 45 animals identified RORAv5 as being significantly differentially-expressed (p = 0.038) 30 between resistant, intermediate and susceptible groups; RORAv2 was not differentially- 31 expressed (p = 0.77). Spearman’s rank analysis showed that RORAv5 transcript copy number 32 was significantly negatively correlated with parameters of susceptibility, AWC and FEC; and 33 was positively correlated with BW. RORAv2 was not correlated with AWC, FEC or BW but 34 was significantly negatively correlated with IgA antibody levels [corrected]. This study identifies the full length RORA variant (RORAv5) as important in controlling the protective immune

  12. Advances, practice, and clinical perspectives in high-throughput sequencing.

    PubMed

    Park, S-J; Saito-Adachi, M; Komiyama, Y; Nakai, K

    2016-07-01

    Remarkable advances in high-throughput sequencing technologies have fundamentally changed our understanding of the genetic and epigenetic molecular bases underlying human health and diseases. As these technologies continue to revolutionize molecular biology leading to fresh perspectives, it is imperative to thoroughly consider the enormous excitement surrounding the technologies by highlighting the characteristics of platforms and their global trends as well as potential benefits and limitations. To date, with a variety of platforms, the technologies provide an impressive range of applications, including sequencing of whole genomes and transcriptomes, identifying of genome modifications, and profiling of protein interactions. Because these applications produce a flood of data, simultaneous development of bioinformatics tools is required to efficiently deal with the big data and to comprehensively analyze them. This review covers the major achievements and performances of the high-throughput sequencing and further summarizes the characteristics of their applications along with introducing applicable bioinformatics tools. Moreover, a step-by-step procedure for a practical transcriptome analysis is described employing an analytical pipeline. Clinical perspectives with special consideration to human oral health and diseases are also covered. PMID:26602181

  13. Validation of high throughput sequencing and microbial forensics applications

    PubMed Central

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  14. High resolution sequence stratigraphic concepts applied to geostatistical modeling

    SciTech Connect

    Desaubliaux, G.; De Lestang, A.P.; Eschard, R.

    1995-08-01

    Lithofacies simulations using a high resolution 3D grid allow to enhance the geometries of internal heterogeneities of reservoirs. In this study the series simulated were the Ness formation, part of the Brent reservoir in the Dunbar field located in the Viking graben of the North Sea. Simulations results have been used to build the reservoir layering supporting the 3D grid used for reservoir engineering, and also used as a frame to study the effects of secondary diagenetic processes on petrophysical properties. The method used is based on a geostatistical study and integrates the following data: a geological model using sequence stratigraphic concepts to define lithofacies sequences and associated bounding surfaces, well data (cores and logs) used as database for geostatistical analysis and simulations, seismic data: a 3D seismic survey has been used to define the internal surfaces bounding the units, outcrop data: The Mesa Verde formation (Colorado, USA) has been used as an outcrop analog to calibrate geostatistical parameters for the simulations (horizontal range of the variograms). This study illustrates the capacity to use high resolution sequence stratigraphic concepts to improve the simulations of reservoirs when the lack of subsurface information reduce the accuracy of geostatistical analysis.

  15. Molecular characterization and mitochondrial sequence variation in two sympatric species of Proechimys (Rodentia: Echimyidae) in French Guiana.

    PubMed

    Steiner; Sourrouille; Catzeflis

    2000-12-01

    Variations in mitochondrial DNA characters were used to characterize two morphologically similar and sympatric species of Neotropical terrestrial rodents of the genus Proechimys (Mammalia: Echimyidae). We sequenced both cytochrome b (1140pb) and part of the control region (445pb) from four individuals of P. cuvieri and five of P. cayennensis from French Guiana, which allowed us to depict intra- and inter-specific patterns of variation. The phylogenetic relationships between the nine sequences evidence the monophyly of each species, and illustrate that more polymorphism might exist within P. cuvieri than within P. cayennensis. By developing species-specific primers to amplify a fragment of the cytochrome b gene, we were able to identify 50 individuals of Proechimys spp. caught in two localities of French Guiana. In both sites of primary rainforests, we showed that the two species live in syntopy, and this observation emphasizes the need to document ecological differences which should exist in order to diminish inter-specific competition.

  16. Allelic sequence variation of the HLA-DQ loci: relationship to serology and to insulin-dependent diabetes susceptibility.

    PubMed Central

    Horn, G T; Bugawan, T L; Long, C M; Erlich, H A

    1988-01-01

    Analysis of sequence variation in the polymorphic second exon of the major histocompatibility complex genes HLA-DQ alpha and -DQ beta has revealed 8 allelic variants at the alpha locus and 13 variants at the beta locus. Correlation of sequence variation with serologic typing suggests that the DQw2, DQw3, and DQ(blank) types are determined by the DQ beta subunit, while the DQw1 specificity is determined by DQ alpha. The nature of the amino acid at position 57 in the DQ beta subunit is correlated with susceptibility to insulin-dependent diabetes mellitus. This region of the DQ beta chain contains shared peptides with Epstein-Barr virus and rubella virus. PMID:2842756

  17. Gene-pool variation in caledonian and European Scots pine (Pinus sylvestris L.) revealed by chloroplast simple-sequence repeats.

    PubMed

    Provan, J; Soranzo, N; Wilson, N J; McNicol, J W; Forrest, G I; Cottrell, J; Powell, W

    1998-09-22

    We have used polymorphic chloroplast simple-sequence repeats to analyse levels of genetic variation within and between seven native Scottish and eight mainland European populations of Scots pine (Pinus sylvestris L.). Diversity levels for the Scottish populations based on haplotype frequency were far in excess of those previously obtained using monoterpenes and isozymes and confirmed lower levels of genetic variation within the derelict population at Glen Falloch. The diversity levels were higher than those reported in similar studies in other Pinus species. An analysis of molecular variance (AMOVA) showed that small (3.24-8.81%) but significant (p < or = 0.001) portions of the variation existed between the populations and that there was no significant difference between the Scottish and the mainland European populations. Evidence of population substructure was found in the Rannoch population, which exhibited two subgroups. Finally, one of the loci studied exhibited an allele distribution uncharacteristic of the stepwise mutation model of evolution of simple-sequence repeats, and sequencing of the PCR products revealed that this was due to a duplication rather than slippage in the repeat region. An examination of the distribution of this mutation suggests that it may have occurred fairly recently in the Wester Ross region or that it may be evidence of a refugial population.

  18. High Resolution Global Electrical Conductivity Variations in the Earth's Mantle

    NASA Astrophysics Data System (ADS)

    Kelbert, A.; Sun, J.; Egbert, G. D.

    2013-12-01

    Electrical conductivity of the Earth's mantle is a valuable constraint on the water content and melting processes. In Kelbert et al. (2009), we obtained the first global inverse model of electrical conductivity in the mantle capable of providing constraints on the lateral variations in mantle water content. However, in doing so we had to compromise on the problem complexity by using the historically very primitive ionospheric and magnetospheric source assumptions. In particular, possible model contamination by the auroral current systems had greatly restricted our use of available data. We have now addressed this problem by inverting for the external sources along with the electrical conductivity variations. In this study, we still focus primarily on long period data that are dominated by quasi-zonal source fields. The improved understanding of the ionospheric sources allows us to invert the magnetic fields directly, without a correction for the source and/or the use of transfer functions. It allows us to extend the period range of available data to 1.2 days - 102 days, achieving better sensitivity to the upper mantle and transition zone structures. Finally, once the source effects in the data are accounted for, a much larger subset of observatories may be used in the electrical conductivity inversion. Here, we use full magnetic fields at 207 geomagnetic observatories, which include mid-latitude, equatorial and high latitude data. Observatory hourly means from the years 1958-2010 are employed. The improved quality and spatial distribution of the data set, as well as the high resolution modeling and inversion using degree and order 40 spherical harmonics mapped to a 2x2 degree lateral grid, all contribute to the much improved resolution of our models, representing a conceptual step forward in global electromagnetic sounding. We present a fully three-dimensional, global electrical conductivity model of the Earth's mantle as inferred from ground geomagnetic

  19. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement.

    PubMed

    Le Coq, Johanne; Ghosh, Partho

    2011-08-30

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10(20) potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  20. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    SciTech Connect

    Le Coq, Johanne; Ghosh, Partho

    2012-06-19

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd ({approx}16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 10{sup 20} potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation.

  1. Conservation of the C-type lectin fold for massive sequence variation in a Treponema diversity-generating retroelement

    PubMed Central

    Le Coq, Johanne; Ghosh, Partho

    2011-01-01

    Anticipatory ligand binding through massive protein sequence variation is rare in biological systems, having been observed only in the vertebrate adaptive immune response and in a phage diversity-generating retroelement (DGR). Earlier work has demonstrated that the prototypical DGR variable protein, major tropism determinant (Mtd), meets the demands of anticipatory ligand binding by novel means through the C-type lectin (CLec) fold. However, because of the low sequence identity among DGR variable proteins, it has remained unclear whether the CLec fold is a general solution for DGRs. We have addressed this problem by determining the structure of a second DGR variable protein, TvpA, from the pathogenic oral spirochete Treponema denticola. Despite its weak sequence identity to Mtd (∼16%), TvpA was found to also have a CLec fold, with predicted variable residues exposed in a ligand-binding site. However, this site in TvpA was markedly more variable than the one in Mtd, reflecting the unprecedented approximate 1020 potential variability of TvpA. In addition, similarity between TvpA and Mtd with formylglycine-generating enzymes was detected. These results provide strong evidence for the conservation of the formylglycine-generating enzyme-type CLec fold among DGRs as a means of accommodating massive sequence variation. PMID:21873231

  2. The Effects of Sequence Variation on Genome-wide NRF2 Binding—New Target Genes and Regulatory SNPs

    PubMed Central

    Kuosmanen, Suvi M.; Viitala, Sari; Laitinen, Tuomo; Peräkylä, Mikael; Pölönen, Petri; Kansanen, Emilia; Leinonen, Hanna; Raju, Suresh; Wienecke-Baldacchino, Anke; Närvänen, Ale; Poso, Antti; Heinäniemi, Merja; Heikkinen, Sami; Levonen, Anna-Liisa

    2016-01-01

    Transcription factor binding specificity is crucial for proper target gene regulation. Motif discovery algorithms identify the main features of the binding patterns, but the accuracy on the lower affinity sites is often poor. Nuclear factor E2-related factor 2 (NRF2) is a ubiquitous redox-activated transcription factor having a key protective role against endogenous and exogenous oxidant and electrophile stress. Herein, we decipher the effects of sequence variation on the DNA binding sequence of NRF2, in order to identify both genome-wide binding sites for NRF2 and disease-associated regulatory SNPs (rSNPs) with drastic effects on NRF2 binding. Interactions between NRF2 and DNA were studied using molecular modelling, and NRF2 chromatin immunoprecipitation-sequence datasets together with protein binding microarray measurements were utilized to study binding sequence variation in detail. The binding model thus generated was used to identify genome-wide binding sites for NRF2, and genomic binding sites with rSNPs that have strong effects on NRF2 binding and reside on active regulatory elements in human cells. As a proof of concept, miR-126–3p and -5p were identified as NRF2 target microRNAs, and a rSNP (rs113067944) residing on NRF2 target gene (Ferritin, light polypeptide, FTL) promoter was experimentally verified to decrease NRF2 binding and result in decreased transcriptional activity. PMID:26826707

  3. DNA sequence variation of wild barley Hordeum spontaneum (L.) across environmental gradients in Israel.

    PubMed

    Bedada, G; Westerbergh, A; Nevo, E; Korol, A; Schmid, K J

    2014-06-01

    Wild barley Hordeum spontaneum (L.) shows a wide geographic distribution and ecological diversity. A key question concerns the spatial scale at which genetic differentiation occurs and to what extent it is driven by natural selection. The Levant region exhibits a strong ecological gradient along the North-South axis, with numerous small canyons in an East-West direction and with small-scale environmental gradients on the opposing North- and South-facing slopes. We sequenced 34 short genomic regions in 54 accessions of wild barley collected throughout Israel and from the opposing slopes of two canyons. The nucleotide diversity of the total sample is 0.0042, which is about two-thirds of a sample from the whole species range (0.0060). Thirty accessions collected at 'Evolution Canyon' (EC) at Nahal Oren, close to Haifa, have a nucleotide diversity of 0.0036, and therefore harbor a large proportion of the genetic diversity. There is a high level of genetic clustering throughout Israel and within EC, which roughly differentiates the slopes. Accessions from the hot and dry South-facing slope have significantly reduced genetic diversity and are genetically more distinct from accessions from the North-facing slope, which are more similar to accessions from other regions in Northern Israel. Statistical population models indicate that wild barley within the EC consist of three separate genetic clusters with substantial gene flow. The data indicate a high level of population structure at large and small geographic scales that shows isolation-by-distance, and is also consistent with ongoing natural selection contributing to genetic differentiation at a small geographic scale.

  4. A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny

    PubMed Central

    Pucker, Boas; Holtgräwe, Daniela; Rosleff Sörensen, Thomas; Stracke, Ralf; Viehöver, Prisca

    2016-01-01

    Arabidopsis thaliana is the most important model organism for fundamental plant biology. The genome diversity of different accessions of this species has been intensively studied, for example in the 1001 genome project which led to the identification of many small nucleotide polymorphisms (SNPs) and small insertions and deletions (InDels). In addition, presence/absence variation (PAV), copy number variation (CNV) and mobile genetic elements contribute to genomic differences between A. thaliana accessions. To address larger genome rearrangements between the A. thaliana reference accession Columbia-0 (Col-0) and another accession of about average distance to Col-0, we created a de novo next generation sequencing (NGS)-based assembly from the accession Niederzenz-1 (Nd-1). The result was evaluated with respect to assembly strategy and synteny to Col-0. We provide a high quality genome sequence of the A. thaliana accession (Nd-1, LXSY01000000). The assembly displays an N50 of 0.590 Mbp and covers 99% of the Col-0 reference sequence. Scaffolds from the de novo assembly were positioned on the basis of sequence similarity to the reference. Errors in this automatic scaffold anchoring were manually corrected based on analyzing reciprocal best BLAST hits (RBHs) of genes. Comparison of the final Nd-1 assembly to the reference revealed duplications and deletions (PAV). We identified 826 insertions and 746 deletions in Nd-1. Randomly selected candidates of PAV were experimentally validated. Our Nd-1 de novo assembly allowed reliable identification of larger genic and intergenic variants, which was difficult or error-prone by short read mapping approaches alone. While overall sequence similarity as well as synteny is very high, we detected short and larger (affecting more than 100 bp) differences between Col-0 and Nd-1 based on bi-directional comparisons. The de novo assembly provided here and additional assemblies that will certainly be published in the future will allow to

  5. BLAZAR ANTI-SEQUENCE OF SPECTRAL VARIATION WITHIN INDIVIDUAL BLAZARS: CASES FOR MRK 501 AND 3C 279

    SciTech Connect

    Zhang, Jin; Zhang, Shuang-Nan; Liang, En-Wei

    2013-04-10

    The jet properties of Mrk 501 and 3C 279 are derived by fitting broadband spectral energy distributions (SEDs) with lepton models. The derived {gamma}{sub b} (the break Lorenz factor of the electron distribution) is 10{sup 4}-10{sup 6} for Mrk 501 and 200 {approx} 600 for 3C 279. The magnetic field strength (B) of Mrk 501 is usually one order of magnitude lower than that of 3C 279, but their Doppler factors ({delta}) are comparable. A spectral variation feature where the peak luminosity is correlated with the peak frequency, which is opposite from the blazar sequence, is observed in the two sources. We find that (1) the peak luminosities of the two bumps in the SEDs for Mrk 501 depend on {gamma}{sub b} in both the observer and co-moving frames, but they are not correlated with B and {delta} and (2) the luminosity variation of 3C 279 is dominated by the external Compton (EC) peak and its peak luminosity is correlated with {gamma}{sub b} and {delta}, but anti-correlated with B. These results suggest that {gamma}{sub b} may govern the spectral variation of Mrk 501 and {delta} and B would be responsible for the spectral variation of 3C 279. The narrow distribution of {gamma}{sub b} and the correlation of {gamma}{sub b} and B in 3C 279 would be due to the cooling from the EC process and the strong magnetic field. Based on our brief discussion, we propose that this spectral variation feature may originate from the instability of the corona but not from the variation of the accretion rate as does the blazar sequence.

  6. High-throughput sequencing: a roadmap toward community ecology.

    PubMed

    Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique

    2013-04-01

    High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649

  7. Prevalence of Marek's disease virus in different chicken populations in Iraq and indicative virulence based on sequence variation in the ecoRI-q (meq) gene.

    PubMed

    Wajid, Salih J; Katz, Margaret E; Renz, Katrin G; Walkden-Brown, Stephen W

    2013-06-01

    limited meq gene sequence variation, that all sequenced samples had a short meq with two four-proline repeats, and that this is consistent with a high level of virulence.

  8. BM-SNP: A Bayesian Model for SNP Calling Using High Throughput Sequencing Data.

    PubMed

    Xu, Yanxun; Zheng, Xiaofeng; Yuan, Yuan; Estecio, Marcos R; Issa, Jean-Pierre; Qiu, Peng; Ji, Yuan; Liang, Shoudan

    2014-01-01

    A single-nucleotide polymorphism (SNP) is a sole base change in the DNA sequence and is the most common polymorphism. Detection and annotation of SNPs are among the central topics in biomedical research as SNPs are believed to play important roles on the manifestation of phenotypic events, such as disease susceptibility. To take full advantage of the next-generation sequencing (NGS) technology, we propose a Bayesian approach, BM-SNP, to identify SNPs based on the posterior inference using NGS data. In particular, BM-SNP computes the posterior probability of nucleotide variation at each covered genomic position using the contents and frequency of the mapped short reads. The position with a high posterior probability of nucleotide variation is flagged as a potential SNP. We apply BM-SNP to two cell-line NGS data, and the results show a high ratio of overlap ( >95 percent) with the dbSNP database. Compared with MAQ, BM-SNP identifies more SNPs that are in dbSNP, with higher quality. The SNPs that are called only by BM-SNP but not in dbSNP may serve as new discoveries. The proposed BM-SNP method integrates information from multiple aspects of NGS data, and therefore achieves high detection power. BM-SNP is fast, capable of processing whole genome data at 20-fold average coverage in a short amount of time. PMID:26357041

  9. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  10. Reprint of "Identification of staphylococcal species based on variations in protein sequences (mass spectrometry) and DNA sequence (sodA microarray)".

    PubMed

    Kooken, Jennifer; Fox, Karen; Fox, Alvin; Altomare, Diego; Creek, Kim; Wunschel, David; Pajares-Merino, Sara; Martínez-Ballesteros, Ilargi; Garaizar, Javier; Oyarzabal, Omar; Samadpour, Mansour

    2014-01-01

    This report is among the first using sequence variation in newly discovered protein markers for staphylococcal (or indeed any other bacterial) speciation. Variation, at the DNA sequence level, in the sodA gene (commonly used for staphylococcal speciation) provided excellent correlation. Relatedness among strains was also assessed using protein profiling using microcapillary electrophoresis and pulsed field electrophoresis. A total of 64 strains were analyzed including reference strains representing the 11 staphylococcal species most commonly isolated from man (Staphylococcus aureus and 10 coagulase negative species [CoNS]). Matrix assisted time of flight ionization/ionization mass spectrometry (MALDI TOF MS) and liquid chromatography-electrospray ionization tandem mass spectrometry (LC ESI MS/MS) were used for peptide analysis of proteins isolated from gel bands. Comparison of experimental spectra of unknowns versus spectra of peptides derived from reference strains allowed bacterial identification after MALDI TOF MS analysis. After LC-MS/MS analysis of gel bands bacterial speciation was performed by comparing experimental spectra versus virtual spectra using the software X!Tandem. Finally LC-MS/MS was performed on whole proteomes and data analysis also employing X!tandem. Aconitate hydratase and oxoglutarate dehydrogenase served as marker proteins on focused analysis after gel separation. Alternatively on full proteomics analysis elongation factor Tu generally provided the highest confidence in staphylococcal speciation.

  11. Structural mechanisms underlying sequence-dependent variations in GAG affinities of decorin binding protein A, a Borrelia burgdorferi adhesin.

    PubMed

    Morgan, Ashli M; Wang, Xu

    2015-05-01

    Decorin-binding protein A (DBPA) is an important surface adhesin of the bacterium Borrelia burgdorferi, the causative agent of Lyme disease. DBPA facilitates the bacteria's colonization of human tissue by adhering to glycosaminoglycan (GAG), a sulfated polysaccharide. Interestingly, DBPA sequence variation among different strains of Borrelia spirochetes is high, resulting in significant differences in their GAG affinities. However, the structural mechanisms contributing to these differences are unknown. We determined the solution structures of DBPAs from strain N40 of B. burgdorferi and strain PBr of Borrelia garinii, two DBPA variants whose GAG affinities deviate significantly from strain B31, the best characterized version of DBPA. Our structures revealed that significant differences exist between PBr DBPA and B31/N40 DBPAs. In particular, the C-terminus of PBr DBPA, unlike C-termini from B31 and N40 DBPAs, is positioned away from the GAG-binding pocket and the linker between helices one and two of PBr DBPA is highly structured and retracted from the GAG-binding pocket. The repositioning of the C-terminus allowed the formation of an extra GAG-binding epitope in PBr DBPA and the retracted linker gave GAG ligands more access to the GAG-binding epitopes than other DBPAs. Characterization of GAG ligands' interactions with wild-type (WT) PBr and mutants confirmed the importance of the second major GAG-binding epitope and established the fact that the two epitopes are independent of one another and the new epitope is as important to GAG binding as the traditional epitope.

  12. Optimization of shRNA inhibitors by variation of the terminal loop sequence.

    PubMed

    Schopman, Nick C T; Liu, Ying Poi; Konstantinova, Pavlina; ter Brake, Olivier; Berkhout, Ben

    2010-05-01

    Gene silencing by RNA interference (RNAi) can be achieved by intracellular expression of a short hairpin RNA (shRNA) that is processed into the effective small interfering RNA (siRNA) inhibitor by the RNAi machinery. Previous studies indicate that shRNA molecules do not always reflect the activity of corresponding synthetic siRNAs that attack the same target sequence. One obvious difference between these two effector molecules is the hairpin loop of the shRNA. Most studies use the original shRNA design of the pSuper system, but no extensive study regarding optimization of the shRNA loop sequence has been performed. We tested the impact of different hairpin loop sequences, varying in size and structure, on the activity of a set of shRNAs targeting HIV-1. We were able to transform weak inhibitors into intermediate or even strong shRNA inhibitors by replacing the loop sequence. We demonstrate that the efficacy of these optimized shRNA inhibitors is improved significantly in different cell types due to increased siRNA production. These results indicate that the loop sequence is an essential part of the shRNA design. The optimized shRNA loop sequence is generally applicable for RNAi knockdown studies, and will allow us to develop a more potent gene therapy against HIV-1. PMID:20188764

  13. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    PubMed Central

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  14. High Frequency Variations in Earth Orientation Derived From GNSS Observations

    NASA Astrophysics Data System (ADS)

    Weber, R.; Englich, S.; Snajdrova, K.; Boehm, J.

    2006-12-01

    Current observations gained by the space geodetic techniques, especially VLBI, GPS and SLR, allow for the determination of Earth Orientation Parameters (EOPs - polar motion, UT1/LOD, nutation offsets) with unprecedented accuracy and temporal resolution. This presentation focuses on contributions to the EOP recovery provided by satellite navigation systems (primarily GPS). The IGS (International GNSS Service), for example, currently provides daily polar motion with an accuracy of less than 0.1mas and LOD estimates with an accuracy of a few microseconds. To study more rapid variations in polar motion and LOD we established in a first step a high resolution (hourly resolution) ERP-time series from GPS observation data of the IGS network covering the period from begin of 2005 till March 2006. The calculations were carried out by means of the Bernese GPS Software V5.0 considering observations from a subset of 79 fairly stable stations out of the IGb00 reference frame sites. From these ERP time series the amplitudes of the major diurnal and semidiurnal variations caused by ocean tides are estimated. After correcting the series for ocean tides the remaining geodetic observed excitation is compared with variations of atmospheric excitation (AAM). To study the sensitivity of the estimates with respect to the applied mapping function we applied both the widely used NMF (Niell Mapping Function) and the VMF1 (Vienna Mapping Function 1). In addition, based on computations covering two months in 2005, the potential improvement due to the use of additional GLONASS data will be discussed. Finally, satellite techniques are also able to provide nutation offset rates with respect to the most recent nutation model. Based on GPS observations from 2005 we established nutation rate time series and subsequently derived the amplitudes of several nutation waves with periods less than 30 days. The results are compared to VLBI estimates processed by means of the OCCAM 6.1 software.

  15. [Mitochondrial DNA sequence variation, demographic history, and population structure of Amur sturgeon Acipenser schrenckii Brandt, 1869].

    PubMed

    Shedko, S V; Miroshnichenko, I L; Nemkova, G A; Koshelev, V N; Shedko, M B

    2015-02-01

    The variability of the mtDNA control region (D-loop) was examined in Amur sturgeon endemic to the Amur River. This species is also classified as critically endangered by the IUCN Red List of Threatened species. Sequencing of 796- to 812-bp fragments of the D-loop in 112 sturgeon collected in the Lower Amur revealed 73 different genotypes. The sample was characterized by a high level of haplotypic (0.976) and nucleotide (0.0194) diversity. The identified haplotypes split into two well-defined monophyletic groups, BG (n = 39) and SM (n = 34), differing (HKY distance) on average by 3.41% of nucleotide positions upon an average level of intragroup differences of 0.54 and 1.23%, respectively. Moreover, the haplotypes of the SM groups differed by the presence of a 13-14 bp deletion. Most ofthe samples (66 out of 112) carried BG haplotypes. Overall, the pattern of pairwise nucleotide differences and the results of neutrality tests, as well as the results of tests for compliance with the model of sudden demographic expansion or with the model of exponential growth pointed to a past significant increase in the number of Amur sturgeon, which was most clearly manifested in the analysis of data on the BG haplogroup. The constructed Bayesian skyline plots showed that this growth began about 18 to 16 thousand years ago. At present, the effective size of the strongly reduced (due to overharvesting) population of Amur sturgeon may be equal to or even lower than it was before the beginning of this growth during the Last Glacial Maximum. The presence in the mitochondrial gene pool ofAmur sturgeon of two haplogroups, their unequal evolutionary dynamics, and, judging by scanty data, their unequal representation in the Russian and Chinese parts of the Amur River basin point to the possible existence of at least two distinct populations of Amur sturgeon in the past. PMID:25966586

  16. [Mitochondrial DNA sequence variation, demographic history, and population structure of Amur sturgeon Acipenser schrenckii Brandt, 1869].

    PubMed

    Shedko, S V; Miroshnichenko, I L; Nemkova, G A; Koshelev, V N; Shedko, M B

    2015-02-01

    The variability of the mtDNA control region (D-loop) was examined in Amur sturgeon endemic to the Amur River. This species is also classified as critically endangered by the IUCN Red List of Threatened species. Sequencing of 796- to 812-bp fragments of the D-loop in 112 sturgeon collected in the Lower Amur revealed 73 different genotypes. The sample was characterized by a high level of haplotypic (0.976) and nucleotide (0.0194) diversity. The identified haplotypes split into two well-defined monophyletic groups, BG (n = 39) and SM (n = 34), differing (HKY distance) on average by 3.41% of nucleotide positions upon an average level of intragroup differences of 0.54 and 1.23%, respectively. Moreover, the haplotypes of the SM groups differed by the presence of a 13-14 bp deletion. Most ofthe samples (66 out of 112) carried BG haplotypes. Overall, the pattern of pairwise nucleotide differences and the results of neutrality tests, as well as the results of tests for compliance with the model of sudden demographic expansion or with the model of exponential growth pointed to a past significant increase in the number of Amur sturgeon, which was most clearly manifested in the analysis of data on the BG haplogroup. The constructed Bayesian skyline plots showed that this growth began about 18 to 16 thousand years ago. At present, the effective size of the strongly reduced (due to overharvesting) population of Amur sturgeon may be equal to or even lower than it was before the beginning of this growth during the Last Glacial Maximum. The presence in the mitochondrial gene pool ofAmur sturgeon of two haplogroups, their unequal evolutionary dynamics, and, judging by scanty data, their unequal representation in the Russian and Chinese parts of the Amur River basin point to the possible existence of at least two distinct populations of Amur sturgeon in the past.

  17. Widespread Sequence Variations in VAMP1 across Vertebrates Suggest a Potential Selective Pressure from Botulinum Neurotoxins

    PubMed Central

    Peng, Lisheng; Adler, Michael; Demogines, Ann; Borrell, Andrew; Liu, Huisheng; Tao, Liang; Tepp, William H.; Zhang, Su-Chun; Johnson, Eric A.; Sawyer, Sara L.; Dong, Min

    2014-01-01

    Botulinum neurotoxins (BoNT/A-G), the most potent toxins known, act by cleaving three SNARE proteins required for synaptic vesicle exocytosis. Previous studies on BoNTs have generally utilized the major SNARE homologues expressed in brain (VAMP2, syntaxin 1, and SNAP-25). However, BoNTs target peripheral motor neurons and cause death by paralyzing respiratory muscles such as the diaphragm. Here we report that VAMP1, but not VAMP2, is the SNARE homologue predominantly expressed in adult rodent diaphragm motor nerve terminals and in differentiated human motor neurons. In contrast to the highly conserved VAMP2, BoNT-resistant variations in VAMP1 are widespread across vertebrates. In particular, we identified a polymorphism at position 48 of VAMP1 in rats, which renders VAMP1 either resistant (I48) or sensitive (M48) to BoNT/D. Taking advantage of this finding, we showed that rat diaphragms with I48 in VAMP1 are insensitive to BoNT/D compared to rat diaphragms with M48 in VAMP1. This unique intra-species comparison establishes VAMP1 as a physiological toxin target in diaphragm motor nerve terminals, and demonstrates that the resistance of VAMP1 to BoNTs can underlie the insensitivity of a species to members of BoNTs. Consistently, human VAMP1 contains I48, which may explain why humans are insensitive to BoNT/D. Finally, we report that residue 48 of VAMP1 varies frequently between M and I across seventeen closely related primate species, suggesting a potential selective pressure from members of BoNTs for resistance in vertebrates. PMID:25010769

  18. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  19. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon

    PubMed Central

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, ‘SCNU1154’, ‘Edisto47’, ‘MR-1’, and ‘PMR5’. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  20. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    PubMed

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon. PMID:27311063

  1. Source quality variations tied to sequence development: Integration of physical and chemical aspects, Lower to Middle Triassic, western Barents Sea

    SciTech Connect

    Bohacs, K.M.; Isaksen, G.H. )

    1991-03-01

    Triassic mudrocks from the Barents Sea area demonstrate to covariance of physical and chemical properties of mudrocks deposited in shelfal environments and the aspect of depositional sequences in distal settings. The tie of physical parameters to chemical character within a detailed sequence-stratigraphic framework enables the construction of depositional-facies models to predict organic-matter content and quality. This allows the explorer to more closely constrain and predict the nature of potential source rocks using seismic and well-log data. Changes in lithology, bedding geometry, sedimentary structures, body and trace-fossil assemblages, and inorganic, bulk-organic, and molecular geochemistry revealed the detailed depositional environments. The depositional environments stack predictably, according to their position in the depositional sequence: from aerobic lower-shoreface--offshore transition environments in lowstand systems tracts to dysaerobic-anaerobic distal open-marine-shelf environment in transgressive and early highstand systems tracts. Quantitative molecular geochemistry also revealed variations within this distal setting and strong covariance with sequence position. Input of organic matter from terrigenous higher plants dominates the lowstands whereas marine-algal organic matter is most prevalent within transgressive and highstand systems tracts. Specifically, the abundance of C{sub 30} steranes, total steranes, and moretane reflected development of the sequences.

  2. Maintenance of Sperm Variation in a Highly Promiscuous Wild Bird

    PubMed Central

    Calhim, Sara; Double, Michael C.; Margraf, Nicolas; Birkhead, Tim R.; Cockburn, Andrew

    2011-01-01

    Postcopulatory sexual selection is an important force in the evolution of reproductive traits, including sperm morphology. In birds, sperm morphology is known to be highly heritable and largely condition-independent. Theory predicts, and recent comparative work corroborates, that strong selection in such traits reduces intraspecific phenotypic variation. Here we show that some variation can be maintained despite extreme promiscuity, as a result of opposing, copulation-role-specific selection forces. After controlling for known correlates of siring success in the superb fairy-wren (Malurus cyaneus), we found that (a) lifetime extra-pair paternity success was associated with sperm with a shorter flagellum and relatively large head, and (b) males whose sperm had a longer flagellum and a relatively smaller head achieved higher within-pair paternity. In this species extrapair copulations occur in the same morning, but preceding, pair copulations during a female's fertile period, suggesting that shorter and relatively larger-headed sperm are most successful in securing storage (defense), whereas the opposite phenotype might be better at outcompeting stored sperm (offense). Furthermore, since cuckolding ability is a major contributor to differential male reproductive output, stronger selection on defense sperm competition traits might explain the short sperm of malurids relative to other promiscuous passerines. PMID:22194918

  3. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    PubMed

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  4. Space and time variations of crustal anisotropy during the 1997 Umbria-Marche, central Italy, seismic sequence

    NASA Astrophysics Data System (ADS)

    Piccinini, D.; Margheriti, L.; Chiaraluce, L.; Cocco, M.

    2006-12-01

    We measure crustal anisotropy parameters from several hundreds of aftershocks (ML > 2.5) of the 1997 Umbria-Marche seismic sequence which occurred in a carbonatic fold and thrust belt in the shallow crust of central Apennines (Italy). The analysis of shear wave polarization shows clear S-wave splitting with prevalent fast direction ~140°N and average delay times of 0.06 s. The observed fast direction is parallel to the strike of the activated normal-fault system and to the maximum horizontal stress (σ2) active in the region. This is explained by the presence of stress-aligned microcracks or stress-opened fluid-filled cracks and fractures within the sedimentary coverage, even if the role of structural anisotropy cannot be completely ruled out since the maximum horizontal stress is subparallel to the major structural features of the area (main thrusts and normal faults). The peculiar spatio-temporal evolution of the seismic sequence gives us also the opportunity to investigate temporal variations of anisotropic parameters. We analyse those seismograms whose ray paths sample the crustal volume containing two of the major fault zones, before and after the occurrence of normal faulting mainshocks (Mw > 5). We observe variations of the anisotropic parameters during the days before and after the occurrence of mainshocks and we interpret them in terms of temporal variations of anisotropic parameters. This interpretation is consistent with temporal variations of the local stress condition and of the fluid pressure in the studied crustal volume proposed in the literature. However, since the spatial sampling of the selected ray paths varies with time, we cannot exclude the contribution of spatial variations of anisotropic parameters.

  5. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  6. The use of museum specimens with high-throughput DNA sequencers.

    PubMed

    Burrell, Andrew S; Disotell, Todd R; Bergey, Christina M

    2015-02-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists' ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials.

  7. Analysis of genetic variation within clonal lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch) using AFLP fingerprinting and DNA sequencing.

    PubMed

    Vorwerk, S; Forneck, A

    2007-07-01

    Two AFLP fingerprinting methods were employed to estimate the potential of AFLP fingerprints for the detection of genetic diversity within single founder lineages of grape phylloxera (Daktulosphaira vitifoliae Fitch). Eight clonal lineages, reared under controlled conditions in a greenhouse and reproducing asexually throughout a minimum of 15 generations, were monitored and mutations were scored as polymorphisms between the founder individual and individuals of succeeding generations. Genetic variation was detected within all lineages, from early generations on. Six to 15 polymorphic loci (from a total of 141 loci) were detected within the lineages, making up 4.3% of the total amount of genetic variation. The presence of contaminating extra-genomic sequences (e.g., viral material, bacteria, or ingested chloroplast DNA) was excluded as a source of intraclonal variation. Sequencing of 37 selected polymorphic bands confirmed their origin in mostly noncoding regions of the grape phylloxera genome. AFLP techniques were revealed to be powerful for the identification of reproducible banding patterns within clonal lineages.

  8. [Strategies of genome-wide association study based on high-throughput sequencing].

    PubMed

    Zhou, Jiapeng; Pei, Zhiyong; Chen, Yubao; Chen, Runsheng

    2014-11-01

    Genome-wide association studies (GWASs) have been playing an important role on human complex diseases. Generally speaking, GWAS tries to detect the relationship between genome-wide genetic variants and measurable traits in the population level. Although fruitful, array-based GWASs still exist some problems, for example, the so-called missing heritability--significantly associated SNPs can only explain a small part of phenotypic variation. Other problems include that, in some traits, significantly associated SNPs in one study are hard to be repeated by other studies; and that the functions of significantly associated SNPs are often difficult to interpret. High-throughput sequencing, also known as next-generation sequencing (NGS), could be one of the most promising technologies to solve those problems by quickly producing accurate variations in a high-throughput way. NGS-based GWASs (NGS-GWAS), to some extent, provide a better solution compared with traditional array-based GWASs. We systematically review the strategies and methods for NGS-GWASs, pick out the most feasible and efficient strategies and methods for NGS-GWASs, and discuss their applications in personalized medicine. PMID:25567868

  9. Estimation of Response Functions Based on Variational Bayes Algorithm in Dynamic Images Sequences

    PubMed Central

    2016-01-01

    We proposed a nonparametric Bayesian model based on variational Bayes algorithm to estimate the response functions in dynamic medical imaging. In dynamic renal scintigraphy, the impulse response or retention functions are rather complicated and finding a suitable parametric form is problematic. In this paper, we estimated the response functions using nonparametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions were used within hierarchical priors of the variational Bayes algorithm. We performed our algorithm on the real online dataset of dynamic renal scintigraphy. The results demonstrated that this algorithm improved the estimation of response functions with nonparametric priors. PMID:27631007

  10. Estimation of Response Functions Based on Variational Bayes Algorithm in Dynamic Images Sequences

    PubMed Central

    2016-01-01

    We proposed a nonparametric Bayesian model based on variational Bayes algorithm to estimate the response functions in dynamic medical imaging. In dynamic renal scintigraphy, the impulse response or retention functions are rather complicated and finding a suitable parametric form is problematic. In this paper, we estimated the response functions using nonparametric Bayesian priors. These priors were designed to favor desirable properties of the functions, such as sparsity or smoothness. These assumptions were used within hierarchical priors of the variational Bayes algorithm. We performed our algorithm on the real online dataset of dynamic renal scintigraphy. The results demonstrated that this algorithm improved the estimation of response functions with nonparametric priors.

  11. Understanding Gene Sequence Variation in the Context of Transcription Regulation in Yeast

    NASA Astrophysics Data System (ADS)

    Gat-Viks, Irit; Meller, Renana; Kupiec, Martin; Shamir, Ron

    The availability of expression quantitative trait loci (eQTL) data can help understanding the genetic basis of variation in gene expression. However, it has proven difficult to accurately predict functional genetic changes due to low statistical power. To address this challenge, we developed a novel computational approach for combining eQTL data with complementary regulatory network to identify modules of genes, their underlying genetic polymorphism and their shared regulatory proteins activity. The resulting eQTL model implicates novel central protein complexes that share not only a regulatory protein but also an underlying genetic variation. Our method manifests higher sensitivity than prior computational efforts.

  12. [Hormonal variation during physical exertion at high altitude].

    PubMed

    Sutton, J; Garmendia, F

    1977-01-01

    The influence of the physical exercise at high altitude on the endocrine function was studied in 8 normal native men of sea level and in 8 natives men of high altitude. The sea level dwellers were studied both, at sea level, during an acute exposure to low barometric pressure and after 3 months of acclimatization to altitudes over 3,500 meters above the sea level. The experiments at high altitude were conducted at an altitude of 4,500 meters above the sea level. Two types of exercise were carried out, sub-maximal and maximal, at fasting state, between 8 and 10 a.m. During an acute exposure to altitude the physical exercise produced a marked rise of glucose, cortisol and growth hormone and a fall in the insulin content of plasma. In the sea level dwellers, acclimatized to altitude during 3 months, an elevation of growth hormone was observed only during maximal physical effort. Marked variation in glucose and cortisol were observed during both types of exercise. This shows that in these subjects some adaptative changes have ocurred but of lesser extent as those observed in altitude natives. In the high altitude native higher basal concentrations of growth hormone and glucagón as well as a lower glucose concentration in blood, were found. During exercise the high altitude dweller showed no significant changes in somatotropin, meanwhile an important elevation of cortisol occurred. These findings indicate that the high altitude native has metabolic and endocrine responses to exercise similar to those found in well fitted atletes of sea level. The exposure to altitude provoked a rise in glucagon concentration directly proportional to the time of exposition ot altitude. The physical exercise did not elucidate any change in the glucagon content of blood. PMID:753199

  13. Sequence variation in the Trichuris trichiura beta-tubulin locus: implications for the development of benzimidazole resistance.

    PubMed

    Bennett, A B; Anderson, T J C; Barker, G C; Michael, E; Bundy, D A P

    2002-11-01

    Benzimidazole resistance has evolved in a variety of organisms and typically results from mutations in the beta-tubulin locus at specific amino acid sites. Despite widespread treatment of human intestinal nematodes with benzimidazole drugs, there have been no unambiguous reports of resistance. However, since beta-tubulin mutations conferring resistance are generally recessive, frequencies of resistance alleles less than 30% would be difficult to detect on the basis of drug treatment failures. Here we investigate sequence variation in a 1079 bp segment of the beta-tubulin locus in the human whipworm Trichuris trichiura from 72 individual nematodes from seven countries. We did not observe any alleles with amino acid mutations indicative of resistance, and of 40 point mutations there were only four non-synonymous mutations all of which were singletons. Estimated effective population sizes are an order of magnitude lower than those from another nematode species in which benzimidazole resistance has developed (Haemonchus contortus). Both the lower diversity and reduced population sizes suggest that benzimidazole resistance is likely to evolve less rapidly in Trichuris than in trichostrongyle parasites of livestock. We observed moderate levels of population subdivision (Phi(ST)=0.26) comparable with that previously observed in Ascaris lumbricoides, and identical alleles were frequently found in parasites from different continents, suggestive of recent admixture. A particularly interesting feature of the data is the high nucleotide diversities observed in nematodes from the Caribbean. This genetic complexity may be a direct result of extensive admixture and complex history of human populations in this region of the world. These data should encourage (but not make complacent) those involved in large-scale benzimidazole treatment of human intestinal nematodes.

  14. The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing

    PubMed Central

    Bollback, Jonathan P.; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-01-01

    Background The invention of the Genome Sequence 20™ DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. Methodology We use conventional PCR with 5′-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20™ DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5′tag-analysis. Conclusions We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5′ nucleotide of the tag. In particular, primers 5′ labelled with a cytosine are heavily overrepresented among the final sequences, while those 5′ labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5′primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of

  15. A pedigree-based study of mitochondrial D-loop DNA sequence variation among Arabian horses.

    PubMed

    Bowling, A T; Del Valle, A; Bowling, M

    2000-02-01

    Through DNA sequence comparisons of a mitochondrial D-loop hypervariable region, we investigated matrilineal diversity for Arabian horses in the United States. Sixty-two horses were tested. From published pedigrees they traced in the maternal line to 34 mares acquired primarily in the mid to late 19th century from nomadic Bedouin tribes. Compared with the reference sequence (GenBank X79547), these samples showed 27 haplotypes with altogether 31 base substitution sites within 397 bp of sequence. Based on examination of pedigrees from a random sampling of 200 horses in current studbooks of the Arabian Horse Registry of America, we estimated that this study defined the expected mtDNA haplotypes for at least 89% of Arabian horses registered in the US. The reliability of the studbook recorded maternal lineages of Arabian pedigrees was demonstrated by haplotype concordance among multiple samplings in 14 lines. Single base differences observed within two maternal lines were interpreted as representing alternative fixations of past heteroplasmy. The study also demonstrated the utility of mtDNA sequence studies to resolve historical maternity questions without access to biological material from the horses whose relationship was in question, provided that representatives of the relevant female lines were available for comparison. The data call into question the traditional assumption that Arabian horses of the same strain necessarily share a common maternal ancestry.

  16. Comparative Effectiveness of Variations in the Demonstration Method of Teaching a Complex Manipulative Sequence.

    ERIC Educational Resources Information Center

    Blankenbaker, E. Keith

    There are so many methods and approaches to teaching that it is sometimes difficult to choose the approach best suited to the needs of the students. This study sought to ascertain the relative effectiveness and efficiency of selected approaches to the demonstration of complex manipulative sequences, and to test the theory that students of high…

  17. Mitogenome sequence variation in migratory and stationary ecotypes of North-east Atlantic cod.

    PubMed

    Karlsen, Bård O; Emblem, Åse; Jørgensen, Tor E; Klingan, Kevin A; Nordeide, Jarle T; Moum, Truls; Johansen, Steinar D

    2014-06-01

    Sequencing of mitochondrial gene fragments from specimens representing a wide range of geographical locations has indicated limited population structuring in Atlantic cod (Gadus morhua). We recently performed whole genome analysis based on next-generation sequencing of two pooled ecotype samples representing offshore migratory and inshore stationary cod from the North-east Atlantic Ocean. Here we report molecular features and variability of the 16.7kb mitogenome component that was collected from the datasets. These sequences represented more than 25 times coverage of each individual and more than 1100 times coverage of each ecotype sample. We estimated the mitogenome to have evolved 14 times more rapidly than the nuclear genome. Among the 365 single nucleotide polymorphism (SNP) sites identified, 121 were shared between ecotypes, and 151 and 93 were private within the migratory and stationary cod, respectively. We found 323 SNPs to be located in protein coding genes, of which 29 were non-synonymous. One synonymous site in ND2 was likely to be under positive selection. FST measurements indicated weak differentiation in ND1 and ND2 between ecotypes. We conclude that the Atlantic cod mitogenome and the nuclear genome apparently evolved by distinct evolutionary constraints, and that the reproductive isolation observed from whole genome analysis was not visible in the mtDNA sequences.

  18. Bovine herpesvirus-1: comparison and differentiation of vaccine and field strains based on genomic sequence variation.

    PubMed

    Fulton, R W; d'Offay, J M; Eberle, R

    2013-03-01

    Bovine herpesvirus-1 (BoHV-1) causes significant disease in cattle including respiratory, fetal diseases, and reproductive tract infections. Control programs usually include vaccination with a modified live viral (MLV) vaccine. On occasion BoHV-1 strains are isolated from diseased animals or fetuses postvaccination. Currently there are no markers for differentiating MLV strains from field strains of BoHV-1. In this study several BoHV-1 strains were sequenced using whole-genome sequencing technologies and the data analyzed to identify single nucleotide polymorphisms (SNPs). Strains sequenced included the reference BoHV-1 Cooper strain (GenBank Accession JX898220), eight commercial MLV vaccine strains, and 14 field strains from cases presented for diagnosis. Based on SNP analyses, the viruses could be classified into groups having similar SNP patterns. The eight MLV strains could be differentiated from one another although some were closely related to each other. A number of field strains isolated from animals with a history of prior vaccination had SNP patterns similar to specific MLV viruses, while other field isolates were very distinct from all vaccine strains. The results indicate that some BoHV-1 isolates from clinically ill cattle/fetuses can be associated with a prior MLV vaccination history, but more information is needed on the rate of BoHV-1 genome sequence change before irrefutable associations can be drawn. PMID:23333211

  19. Gene sequencing reveals heterokaryotic variations and evolutionary mechanisms in Puccinia striiformis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Puccinia striiformis (Ps), the causal agent of stripe rust, is an obligate biotrophic fungus. The objective of this study was to identify polymorphic genes for determining the mechanisms of the pathogen variation. Primers were designed for seven important putative genes including beta-tubulin (BT), ...

  20. Towards Experimental Annotation of Genes by High Throughput Sequencing

    SciTech Connect

    Bradbury, Andrew

    2010-06-03

    Andrew Bradbury of Los Alamos National Laboratory discusses turning annotation into a sequencing pipeline on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  1. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway. PMID:26025428

  2. Oxytocin receptor gene sequences in owl monkeys and other primates show remarkable interspecific regulatory and protein coding variation.

    PubMed

    Babb, Paul L; Fernandez-Duque, Eduardo; Schurr, Theodore G

    2015-10-01

    The oxytocin (OT) hormone pathway is involved in numerous physiological processes, and one of its receptor genes (OXTR) has been implicated in pair bonding behavior in mammalian lineages. This observation is important for understanding social monogamy in primates, which occurs in only a small subset of taxa, including Azara's owl monkey (Aotus azarae). To examine the potential relationship between social monogamy and OXTR variation, we sequenced its 5' regulatory (4936bp) and coding (1167bp) regions in 25 owl monkeys from the Argentinean Gran Chaco, and examined OXTR sequences from 1092 humans from the 1000 Genomes Project. We also assessed interspecific variation of OXTR in 25 primate and rodent species that represent a set of phylogenetically and behaviorally disparate taxa. Our analysis revealed substantial variation in the putative 5' regulatory region of OXTR, with marked structural differences across primate taxa, particularly for humans and chimpanzees, which exhibited unique patterns of large motifs of dinucleotide A+T repeats upstream of the OXTR 5' UTR. In addition, we observed a large number of amino acid substitutions in the OXTR CDS region among New World primate taxa that distinguish them from Old World primates. Furthermore, primate taxa traditionally defined as socially monogamous (e.g., gibbons, owl monkeys, titi monkeys, and saki monkeys) all exhibited different amino acid motifs for their respective OXTR protein coding sequences. These findings support the notion that monogamy has evolved independently in Old World and New World primates, and that it has done so through different molecular mechanisms, not exclusively through the oxytocin pathway.

  3. Sequence variation of the rDNA ITS regions within and between anastomosis groups in Rhizoctonia solani.

    PubMed

    Kuninaga, S; Natsuaki, T; Takeuchi, T; Yokosawa, R

    1997-09-01

    Sequence analysis of the rDNA region containing the internal transcribed spacer (ITS) regions and the 5.8s rDNA coding sequence was used to evaluate the genetic diversity of 45 isolates within and between anastomosis groups (AGs) in Rhizoctonia solani. The 5.8s rDNA sequence was completely conserved across all the AGs examined, whereas the ITS rDNA sequence was found to be highly variable among isolates. The sequence homology in the ITS regions was above 96% for isolates of the same subgroup, 66-100% for isolates of different subgroups within an AG, and 55-96% for isolates of different AGs. In neighbor-joining trees based on distances derived from ITS-5.8s rDNA sequences, subgroups IA, IB and IC within AG-1 and subgroups HG-I and HG-II within AG-4 were placed on statistically significant branches as assessed by bootstrap analysis. These results suggest that sequence analysis of ITS rDNA regions of R. solani may be a valuable tool for identifying AG subgroups of biological significance.

  4. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis

    PubMed Central

    2013-01-01

    Background Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. Results In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Conclusions Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome. PMID:23374229

  5. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs

    PubMed Central

    Dilthey, Alexander T.; Gourraud, Pierre-Antoine; McVean, Gil

    2016-01-01

    Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant

  6. Identification of eight mutations and three sequence variations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene

    SciTech Connect

    Ghanem, N.; Costes, B.; Girodon, E.; Martin, J.; Fanen, P.; Goossens, M. )

    1994-05-15

    To determine cystic fibrosis (CF) defects in a sample of 224 non-[Delta]F508 CF chromosomes, the authors used denaturing gradient gel multiplex analysis of CF transmembrane conductance regulator gene segments, a strategy based on blind exhaustive analysis rather than a search for known mutations. This process allowed detection of 11 novel variations comprising two nonsense mutations (Q890X and W1204X), a splice defect (405 + 4 A [yields] G), a frameshift (3293delA), four presumed missense mutations (S912L, H949Y, L1065P, Q1071P), and three sequence polymorphisms (R31C or 223 C/T, 3471 T/C, and T1220I or 3791 C/T). The authors describe these variations, together with the associated phenotype when defects on both CF chromosomes were identified. 8 refs., 1 fig., 1 tab.

  7. Fast T2-weighted MR imaging: impact of variation in pulse sequence parameters on image quality and artifacts.

    PubMed

    Li, Tao; Mirowitz, Scott A

    2003-09-01

    The purpose of this study was to quantitatively evaluate in a phantom model the practical impact of alteration of key imaging parameters on image quality and artifacts for the most commonly used fast T(2)-weighted MR sequences. These include fast spin-echo (FSE), single shot fast spin-echo (SSFSE), and spin-echo echo-planar imaging (EPI) pulse sequences. We developed a composite phantom with different T1 and T2 values, which was evaluated while stationary as well as during periodic motion. Experiments involved controlled variations in key parameters including effective TE, TR, echo spacing (ESP), receive bandwidth (BW), echo train length (ETL), and shot number (SN). Quantitative analysis consisted of signal-to-noise ratio (SNR), image nonuniformity, full-width-at-half-maximum (i.e., blurring or geometric distortion) and ghosting ratio. Among the fast T(2)-weighted sequences, EPI was most sensitive to alterations in imaging parameters. Among imaging parameters that we tested, effective TE, ETL, and shot number most prominently affected image quality and artifacts. Short T(2) objects were more sensitive to alterations in imaging parameters in terms of image quality and artifacts. Optimal clinical application of these fast T(2)-weighted imaging pulse sequences requires careful attention to selection of imaging parameters.

  8. Effect of sequence variation on the mechanical response of amyloid fibrils probed by steered molecular dynamics simulation.

    PubMed

    Ndlovu, Hlengisizwe; Ashcroft, Alison E; Radford, Sheena E; Harris, Sarah A

    2012-02-01

    The mechanical failure of mature amyloid fibers produces fragments that act as seeds for the growth of new fibrils. Fragmentation may also be correlated with cytotoxicity. We have used steered atomistic molecular dynamics simulations to study the mechanical failure of fibrils formed by the amyloidogenic fragment of human amylin hIAPP20-29 subjected to force applied in a variety of directions. By introducing systematic variations to this peptide sequence in silico, we have also investigated the role of the amino-acid sequence in determining the mechanical stability of amyloid fibrils. Our calculations show that the force required to induce mechanical failure depends on the direction of the applied stress and upon the degree of structural order present in the β-sheet assemblies, which in turn depends on the peptide sequence. The results have implications for the importance of sequence-dependent mechanical properties on seeding the growth of new fibrils and the role of breakage events in cytotoxicity.

  9. Naturally occurring variations in sequence length creates microRNA isoforms that differ in argonaute effector complex specificity

    PubMed Central

    2010-01-01

    Background Micro(mi)RNAs are short RNA sequences, ranging from 16 to 35 nucleotides (miRBase; http://www.mirbase.org). The majority of the identified sequences are 21 or 22 nucleotides in length. Despite the range of sequence lengths for different miRNAs, individual miRNAs were thought to have a specific sequence of a particular length. A recent report describing a longer variant of a previously identified miRNA in Arabidopsis thaliana prompted this investigation for variations in the length of other miRNAs. Results In this paper, we demonstrate that a fifth of annotated A. thaliana miRNAs recorded in miRBase V.14 have stable miRNA isoforms that are one or two nucleotides longer than their respective recorded miRNA. Further, we demonstrate that miRNA isoforms are co-expressed and often show differential argonaute complex association. We postulate that these extensions are caused by differential cleavage of the parent precursor miRNA. Conclusions Our systematic analysis of A. thaliana miRNAs reveals that miRNA length isoforms are relatively common. This finding not only has implications for miRBase and miRNA annotation, but also extends to miRNA validation experiments and miRNA localization studies. Further, we predict that miRNA isoforms are present in other plant species also. PMID:20534119

  10. Genetic Control of Environmental Variation of Two Quantitative Traits of Drosophila melanogaster Revealed by Whole-Genome Sequencing

    PubMed Central

    Sørensen, Peter; de los Campos, Gustavo; Morgante, Fabio; Mackay, Trudy F. C.; Sorensen, Daniel

    2015-01-01

    Genetic studies usually focus on quantifying and understanding the existence of genetic control on expected phenotypic outcomes. However, there is compelling evidence suggesting the existence of genetic control at the level of environmental variability, with some genotypes exhibiting more stable and others more volatile performance. Understanding the mechanisms responsible for environmental variability not only informs medical questions but is relevant in evolution and in agricultural science. In this work fully sequenced inbred lines of Drosophila melanogaster were analyzed to study the nature of genetic control of environmental variance for two quantitative traits: starvation resistance (SR) and startle response (SL). The evidence for genetic control of environmental variance is compelling for both traits. Sequence information is incorporated in random regression models to study the underlying genetic signals, which are shown to be different in the two traits. Genomic variance in sexual dimorphism was found for SR but not for SL. Indeed, the proportion of variance captured by sequence information and the contribution to this variance from four chromosome segments differ between sexes in SR but not in SL. The number of studies of environmental variation, particularly in humans, is limited. The availability of full sequence information and modern computationally intensive statistical methods provides opportunities for rigorous analyses of environmental variability. PMID:26269504

  11. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  12. Lack of sequence variation in sporadic bovine leucosis in regions of tumour suppressor genes p53 and p16.

    PubMed

    Mayr, B; Grüneis, C; Brem, G; Reifinger, M; Schaffner, G; Hochsteiner, W

    2001-08-01

    Regions of the promoter and exons 5-8 of the tumour suppressor gene p53 were analysed in 25 cases of sporadic bovine leucosis. The study included 17 cases of juvenile leucosis, five cases of adult leucosis and three cases of skin leucosis. Exon 2 of tumour suppressor gene p16 was also investigated in the same samples. No sequence variations were present in the analysed areas of the genes. In p53, this fact represents a clear difference in comparison with enzootic bovine leucosis. In p16, no comparative data are available. PMID:11554494

  13. The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes

    PubMed Central

    Steward, Charles A.; Gonzalez, Jose M.; Trevanion, Steve; Sheppard, Dan; Kerry, Giselle; Gilbert, James G. R.; Wicker, Linda S.; Rogers, Jane; Harrow, Jennifer L.

    2013-01-01

    Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility. Database URLs: http://www.sanger.ac.uk/resources/mouse/nod/; http://vega

  14. Patterns of structural and sequence variation within isotype lineages of the Neisseria meningitidis transferrin receptor system

    PubMed Central

    Adamiak, Paul; Calmettes, Charles; Moraes, Trevor F; Schryvers, Anthony B

    2015-01-01

    Neisseria meningitidis inhabits the human upper respiratory tract and is an important cause of sepsis and meningitis. A surface receptor comprised of transferrin-binding proteins A and B (TbpA and TbpB), is responsible for acquiring iron from host transferrin. Sequence and immunological diversity divides TbpBs into two distinct lineages; isotype I and isotype II. Two representative isotype I and II strains, B16B6 and M982, differ in their dependence on TbpB for in vitro growth on exogenous transferrin. The crystal structure of TbpB and a structural model for TbpA from the representative isotype I N. meningitidis strain B16B6 were obtained. The structures were integrated with a comprehensive analysis of the sequence diversity of these proteins to probe for potential functional differences. A distinct isotype I TbpA was identified that co-varied with TbpB and lacked sequence in the region for the loop 3 α-helix that is proposed to be involved in iron removal from transferrin. The tightly associated isotype I TbpBs had a distinct anchor peptide region, a distinct, smaller linker region between the lobes and lacked the large loops in the isotype II C-lobe. Sequences of the intact TbpB, the TbpB N-lobe, the TbpB C-lobe, and TbpA were subjected to phylogenetic analyses. The phylogenetic clustering of TbpA and the TbpB C-lobe were similar with two main branches comprising the isotype 1 and isotype 2 TbpBs, possibly suggesting an association between TbpA and the TbpB C-lobe. The intact TbpB and TbpB N-lobe had 4 main branches, one consisting of the isotype 1 TbpBs. One isotype 2 TbpB cluster appeared to consist of isotype 1 N-lobe sequences and isotype 2 C-lobe sequences, indicating the swapping of N-lobes and C-lobes. Our findings should inform future studies on the interaction between TbpB and TbpA and the process of iron acquisition. PMID:25800619

  15. Genomic Analysis of Natural Selection and Phenotypic Variation in High-Altitude Mongolians

    PubMed Central

    Watkins, W. Scott; Witherspoon, David J.; Wu, Wilfred; Qin, Ga; Huff, Chad D.; Jorde, Lynn B.; Ge, Ri-Li

    2013-01-01

    Deedu (DU) Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR), neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1), as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG). Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1) shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments. PMID:23874230

  16. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

    PubMed Central

    2013-01-01

    Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206

  17. Anatomic Variations of Cervical and High Thoracic Ligamentum Flavum

    PubMed Central

    Yoon, Sang Pil; Kim, Hyun Jung

    2014-01-01

    Background Epidural blocks are widely used for the management of acute and chronic pain. The technique of loss of resistance is frequently adopted to determine the epidural space. A discontinuity of the ligamentum flavum may increase the risk of failure to identify the epidural space. The purpose of this study was to investigate the anatomic variations of the cervical and high thoracic ligamentum flavum in embalmed cadavers. Methods Vertebral column specimens of 15 human cadavers were obtained. After vertebral arches were detached from pedicles, the dural sac and epidural connective tissue were removed. The ligamentum flavum from C3 to T6 was directly examined anteriorly. Results The incidence of midline gaps in the ligamentum flavum was 87%-100% between C3 and T2. The incidence decreased below this level and was the lowest at T4-T5 (8%). Among the levels with a gap, the location of a gap in the caudal third of the ligamentum flavum was more frequent than in the middle or cephalic portion of the ligamentum flavum. Conclusions The cervical and high thoracic ligamentum flavum frequently has midline intervals with various features, especially in the caudal portion of the intervertebral space. Therefore, the ligamentum flavum is not always reliable as a perceptible barrier to identify the epidural space at these vertebral levels. Additionally, it may be more useful to insert the needle into the cephalic portion of the intervertebral space than in the caudal portion. PMID:25317280

  18. Advantages of Single-Molecule Real-Time Sequencing in High-GC Content Genomes

    PubMed Central

    Shin, Seung Chul; Ahn, Do Hwan; Kim, Su Jin; Lee, Hyoungseok; Oh, Tae-Jin; Lee, Jong Eun; Park, Hyun

    2013-01-01

    Next-generation sequencing has become the most widely used sequencing technology in genomics research, but it has inherent drawbacks when dealing with high-GC content genomes. Recently, single-molecule real-time sequencing technology (SMRT) was introduced as a third-generation sequencing strategy to compensate for this drawback. Here, we report that the unbiased and longer read length of SMRT sequencing markedly improved genome assembly with high GC content via gap filling and repeat resolution. PMID:23894349

  19. Amplification and thrifty single-molecule sequencing of recurrent somatic structural variations.

    PubMed

    Patel, Anand; Schwab, Richard; Liu, Yu-Tsueng; Bafna, Vineet

    2014-02-01

    Deletion of tumor-suppressor genes as well as other genomic rearrangements pervade cancer genomes across numerous types of solid tumor and hematologic malignancies. However, even for a specific rearrangement, the breakpoints may vary between individuals, such as the recurrent CDKN2A deletion. Characterizing the exact breakpoints for structural variants (SVs) is useful for designating patient-specific tumor biomarkers. We propose AmBre (Amplification of Breakpoints), a method to target SV breakpoints occurring in samples composed of heterogeneous tumor and germline DNA. Additionally, AmBre validates SVs called by whole-exome/genome sequencing and hybridization arrays. AmBre involves a PCR-based approach to amplify the DNA segment containing an SV's breakpoint and then confirms breakpoints using sequencing by Pacific Biosciences RS. To amplify breakpoints with PCR, primers tiling specified target regions are carefully selected with a simulated annealing algorithm to minimize off-target amplification and maximize efficiency at capturing all possible breakpoints within the target regions. To confirm correct amplification and obtain breakpoints, PCR amplicons are combined without barcoding and simultaneously long-read sequenced using a single SMRT cell. Our algorithm efficiently separates reads based on breakpoints. Each read group supporting the same breakpoint corresponds with an amplicon and a consensus amplicon sequence is called. AmBre was used to discover CDKN2A deletion breakpoints in cancer cell lines: A549, CEM, Detroit562, MOLT4, MCF7, and T98G. Also, we successfully assayed RUNX1-RUNX1T1 reciprocal translocations by finding both breakpoints in the Kasumi-1 cell line. AmBre successfully targets SVs where DNA harboring the breakpoints are present in 1:1000 mixtures.

  20. Sequence variation of cytotoxic T cell epitopes in different isolates of Epstein-Barr virus.

    PubMed

    Apolloni, A; Moss, D; Stumm, R; Burrows, S; Suhrbier, A; Misko, I; Schmidt, C; Sculley, T

    1992-01-01

    Previous results have identified two distinct cytotoxic T lymphocyte (CTL) epitopes encoded by Epstein-Barr virus (EBV), TETA (ORF BLRF3/BERF1 residues 329-353) and EENL (ORF BERF3/BERF4 residues 290-309). Measurement of the specificities of CTL clones (TETA-specific clone 13 and EENL-specific clone 7) directed against these epitopes indicated that the EENL epitope is conserved in all strains of EBV tested while the TETA epitope varied between individual virus strains. Sequencing of the DNA regions encoding these two CTL epitopes in different EBV isolates confirmed these interpretations and demonstrated that different TETA epitope sequences were encoded by B-type EBV strains and by the B95-8 isolate of EBV compared to the other A-type EBV strains. Titration of synthetic variants of the TETA epitope revealed that the epitope encoded by B95-8 was 15-fold less efficient as a T cell epitope than the sequence encoded by other A-type viral strains while the TETA variant encoded by the B-type strains displayed essentially no activity as a T cell epitope.

  1. Optimally recovering rate variation information from genomes and sequences: pattern filtering.

    PubMed

    Lake, J A

    1998-09-01

    Nucleotide substitution rates vary at different positions within genes and genomes, but rates are difficult to estimate, because they are masked by the stochastic nature of substitutions. In this paper, a linear method, pattern filtering, is described which can optimally separate the signals (related to substitution rates or to other measures of sequence change) from stochastic noise. Pattern filtering promises to be useful in both genomic and molecular evolution studies. In an example using mitochondrial genomes, it is shown that pattern filtering can reveal coding and non-coding regions without the need for prior identification of reading frames or other knowledge of the sequence and promises to be an important tool for genomic analysis. In a second example, it is shown that pattern filtering allows one to classify sites on the basis of an estimator of substitution rates. Using elongation factor EF-1 alpha sequences, it is shown that the fastest sites favor archaea as the sister taxon of eukaryotes, whereas the slower sites support the eocyte prokaryotes as the sister taxon of eukaryotes, suggesting that the former result is an artifact of "long branch attraction." PMID:9729887

  2. Functional consequences of sequence variation in the pheromone biosynthetic gene pgFAR for Ostrinia moths.

    PubMed

    Lassance, Jean-Marc; Liénard, Marjorie A; Antony, Binu; Qian, Shuguang; Fujii, Takeshi; Tabata, Jun; Ishikawa, Yukio; Löfstedt, Christer

    2013-03-01

    Pheromones are central to the mating systems of a wide range of organisms, and reproductive isolation between closely related species is often achieved by subtle differences in pheromone composition. In insects and moths in particular, the use of structurally similar components in different blend ratios is usually sufficient to impede gene flow between taxa. To date, the genetic changes associated with variation and divergence in pheromone signals remain largely unknown. Using the emerging model system Ostrinia, we show the functional consequences of mutations in the protein-coding region of the pheromone biosynthetic fatty-acyl reductase gene pgFAR. Heterologous expression confirmed that pgFAR orthologs encode enzymes exhibiting different substrate specificities that are the direct consequences of extensive nonsynonymous substitutions. When taking natural ratios of pheromone precursors into account, our data reveal that pgFAR substrate preference provides a good explanation of how species-specific ratios of pheromone components are obtained among Ostrinia species. Moreover, our data indicate that positive selection may have promoted the observed accumulation of nonsynonymous amino acid substitutions. Site-directed mutagenesis experiments substantiate the idea that amino acid polymorphisms underlie subtle or drastic changes in pgFAR substrate preference. Altogether, this study identifies the reduction step as a potential source of variation in pheromone signals in the moth genus Ostrinia and suggests that selection acting on particular mutations provides a mechanism allowing pheromone reductases to evolve new functional properties that may contribute to variation in the composition of pheromone signals.

  3. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing.

    PubMed

    Egge, Elianne Sirnaes; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-06-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September-October (autumn) and lowest in April-May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3-5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters. PMID:25893259

  4. Seasonal diversity and dynamics of haptophytes in the Skagerrak, Norway, explored by high-throughput sequencing.

    PubMed

    Egge, Elianne Sirnaes; Johannessen, Torill Vik; Andersen, Tom; Eikrem, Wenche; Bittner, Lucie; Larsen, Aud; Sandaa, Ruth-Anne; Edvardsen, Bente

    2015-06-01

    Microalgae in the division Haptophyta play key roles in the marine ecosystem and in global biogeochemical processes. Despite their ecological importance, knowledge on seasonal dynamics, community composition and abundance at the species level is limited due to their small cell size and few morphological features visible under the light microscope. Here, we present unique data on haptophyte seasonal diversity and dynamics from two annual cycles, with the taxonomic resolution and sampling depth obtained with high-throughput sequencing. From outer Oslofjorden, S Norway, nano- and picoplanktonic samples were collected monthly for 2 years, and the haptophytes targeted by amplification of RNA/cDNA with Haptophyta-specific 18S rDNA V4 primers. We obtained 156 operational taxonomic units (OTUs), from c. 400.000 454 pyrosequencing reads, after rigorous bioinformatic filtering and clustering at 99.5%. Most OTUs represented uncultured and/or not yet 18S rDNA-sequenced species. Haptophyte OTU richness and community composition exhibited high temporal variation and significant yearly periodicity. Richness was highest in September-October (autumn) and lowest in April-May (spring). Some taxa were detected all year, such as Chrysochromulina simplex, Emiliania huxleyi and Phaeocystis cordata, whereas most calcifying coccolithophores only appeared from summer to early winter. We also revealed the seasonal dynamics of OTUs representing putative novel classes (clades HAP-3-5) or orders (clades D, E, F). Season, light and temperature accounted for 29% of the variation in OTU composition. Residual variation may be related to biotic factors, such as competition and viral infection. This study provides new, in-depth knowledge on seasonal diversity and dynamics of haptophytes in North Atlantic coastal waters.

  5. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.

    PubMed

    Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R

    2013-06-01

    Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow.

  6. Complete Sequence Construction of the Highly Repetitive Ribosomal RNA Gene Repeats in Eukaryotes Using Whole Genome Sequence Data.

    PubMed

    Agrawal, Saumya; Ganley, Austen R D

    2016-01-01

    The ribosomal RNA genes (rDNA) encode the major rRNA species of the ribosome, and thus are essential across life. These genes are highly repetitive in most eukaryotes, forming blocks of tandem repeats that form the core of nucleoli. The primary role of the rDNA in encoding rRNA has been long understood, but more recently the rDNA has been implicated in a number of other important biological phenomena, including genome stability, cell cycle, and epigenetic silencing. Noncoding elements, primarily located in the intergenic spacer region, appear to mediate many of these phenomena. Although sequence information is available for the genomes of many organisms, in almost all cases rDNA repeat sequences are lacking, primarily due to problems in assembling these intriguing regions during whole genome assemblies. Here, we present a method to obtain complete rDNA repeat unit sequences from whole genome assemblies. Limitations of next generation sequencing (NGS) data make them unsuitable for assembling complete rDNA unit sequences; therefore, the method we present relies on the use of Sanger whole genome sequence data. Our method makes use of the Arachne assembler, which can assemble highly repetitive regions such as the rDNA in a memory-efficient way. We provide a detailed step-by-step protocol for generating rDNA sequences from whole genome Sanger sequence data using Arachne, for refining complete rDNA unit sequences, and for validating the sequences obtained. In principle, our method will work for any species where the rDNA is organized into tandem repeats. This will help researchers working on species without a complete rDNA sequence, those working on evolutionary aspects of the rDNA, and those interested in conducting phylogenetic footprinting studies with the rDNA. PMID:27576718

  7. A family of differentially amplified repetitive DNA sequences in the genus Beta reveals genetic variation in Beta vulgaris subspecies and cultivars.

    PubMed

    Kubis, S; Heslop-Harrison, J S; Schmidt, T

    1997-03-01

    Members of a highly abundant restriction satellite family have been isolated from the wild beet species Beta nana. The satellite DNA sequence is characterized by a conserved RsaI restriction site and is present in three of four sections of the genus Beta, namely Nanae, Corollinae, and Beta. It was not detected in species of the evolutionary old section Procumbentes, suggesting its amplification after separation of this section. Sequences of eight monomers were aligned revealing a size variation from 209 to 233 bp and an AT content ranging from 56.5% to 60.5%. The similarity between monomers in B. nana varied from 77.7% to 92.2%. Diverged subfamilies were identified by sequence analysis and Southern hybridization. A comparative study of this repetitive DNA element by fluorescent in situ hybridization and Southern analyses in three representative species was performed showing a variable genomic organization and heterogeneous localizations along metaphase chromosomes both within and between species. In B. nana the copy number of this satellite, with some 30,000 per haploid genome, is more than tenfold higher than in Beta lomatogona and up to 200 times higher than in Beta vulgaris, indicating different levels of sequence amplification during evolution in the genus Beta. In sugar beet (B. vulgaris), the large-scale organization of this tandem repeat was examined by pulsed-field gel electrophoresis. Southern hybridization to genomic DNA digested with DraI demonstrated that satellite arrays are located in AT-rich regions and the tandem repeat is a useful probe for the detection of genetic variation in closely related B. vulgaris cultivars, accessions, and subspecies.

  8. Continuous record of the last 30 ka of Paleosecular Variation in a turbiditic marine sedimentary sequence off the NW Iberian Margin

    NASA Astrophysics Data System (ADS)

    Rey, Daniel; Mohamed, Kais Jacob; Coimbra, Rute

    2014-05-01

    Past variations of the geomagnetic field at decadal to centennial scales are recorded with exceptional quality in lava flows, but these are discontinuous and therefore high temporal resolution analyses of paleosecular variation of the geomagnetic field (PSV) are difficult. For such purposes, marine sediments hold a better potential since they are often regarded as continuous sedimentary archives of a range of environmental processes, in particular PSV. While this assumption is generally valid for the deep abyss, it may not be necessarily true for marginal settings and the vicinity of seamounts, where discontinuous sedimentary flows (e.g. turbidites) occur with a relatively high frequency. In this contribution, we present results from two gravity cores (TG8 and TG10) obtained from the flanks of the Galicia Bank, a structural high in the NW Iberian Margin. These cores are mostly comprised of a turbiditic sequence, with continuous pelagic sedimentation recorded continuously over the last 16 ka. Contrary to what would be expected, Alternating Field demagnetization of the NRM showed a PSV record consistent with the behaviour of the geomagnetic field in this region, which could be correlated with a published record in the adjacent Portuguese Margin (Thouveny et al., 2004). These results show that even in a unstable marine sedimentary setting, affected by discontinuous mass flows and biological activity, the delayed and gradual lock-in of the magnetization allows for a continuous record of the geomagnetic field. References: Thouveny, N., Carcaillet, J., Moreno, E., Leduc, G., Nérini, D., 2004. Geomagnetic moment variation and paleomagnetic excursions since 400 kyr BP: a stacked record from sedimentary sequences of the Portuguese Margin. Earth and Planetary Science Letters 219, 377-396.

  9. An Analysis of Stimuli that Influence Compliance during the High-Probability Instruction Sequence

    ERIC Educational Resources Information Center

    Normand, Matthew P.; Kestner, Kathryn; Jessel, Joshua

    2010-01-01

    When we evaluated variables that influence the effectiveness of the high-probability (high-p) instruction sequence, the sequence was associated with a precipitous decrease in compliance with high-"p" instructions for 1 participant, thereby precluding continued use of the sequence. We investigated the reasons for this decrease. Stimuli associated…

  10. Sequence analysis and identification of new variations in the coding sequence of melatonin receptor gene (MTNR1A) of Indian Chokla sheep breed

    PubMed Central

    Saxena, Vijay Kumar; Jha, Bipul Kumar; Meena, Amar Singh; Naqvi, S.M.K.

    2014-01-01

    Melatonin receptor 1A gene is the prime receptor mediating the effect of melatonin at the neuroendocrine level for control of seasonal reproduction in sheep. The aims of this study were to examine the polymorphism pattern of coding sequence of MTNR1A gene in Chokla sheep, a breed of Indian arid tract and to identify new variations in relation to its aseasonal status. Genomic DNAs of 101 Chokla sheep were collected and an 824 bp coding sequence of Exon II was amplified. RFLP was performed with enzyme RsaI and MnlI to assess the presence of polymorphism at position C606T and G612A, respectively. Genotyping revealed significantly higher frequency of M and R alleles than m and r alleles. RR and MM were found to be dominantly present in the group of studied population. Cloning and sequencing of Exon II followed by mutation/polymorphism analysis revealed ten mutations of which three were non-synonymous mutations (G706A, C893A, G931C). G706A leads to substitution of valine by isoleucine Val125I (U14109) in the fifth transmembrane domain. C893A leads to substitution of alanine by aspartic acid in the third extracellular loop. G931C mutation brings about substitution of amino acid alanine by proline in the seventh transmembrane helix, can affect the conformational stability of the molecule. Polyphen-2 analysis revealed that the polymorphism at position 931 is potentially damaging while the mutations at positions 706 and 893 were benign. It is concluded that G931C mutation of MTNR 1A gene, may explain, in part, the importance of melatonin structure integrity in influencing seasonality in sheep. PMID:25606429

  11. Variations of mitochondrial DNA sequence in three breeds of rabbit (Oryctolagus cuniculus).

    PubMed

    Terrance, Diamond G C; Thangamani, A; Srivastava, Varsha

    2007-07-01

    Sequencing of the 300 bp region of the mitochondrial cytochrome b gene was done. Genetic analysis was carried out for the first time in three exotic breeds (Giant White, Soviet Chinchilla, and German Angora) of European rabbit (Oryctolagus cuniculus) to determine intra- and interspecific variability and to measure the genetic distance. The frequencies of types of mutations (transition, transversion, deletion, and insertion) were also determined. This study throws light on matrilineage of breeds that arise due to interbreed crosses and the genetic management of a stocked rabbit breeding population. PMID:17630855

  12. Polarimetric Variations of Binary Stars. IV. Pre-Main-Sequence Spectroscopic Binaries Located in Taurus, Auriga, and Orion

    NASA Astrophysics Data System (ADS)

    Manset, N.; Bastien, P.

    2002-08-01

    We present polarimetric observations of 14 pre-main-sequence (PMS) binaries located in the Taurus, Auriga, and Orion star-forming regions. The majority of the average observed polarizations are below 0.5%, and none are above 0.9%. After removal of estimates of the interstellar polarization, about half the binaries have an intrinsic polarization above 0.5%, even though most of them do not present other evidences for the presence of circumstellar dust. Various tests reveal that 77% of the PMS binaries have or possibly have a variable polarization. LkCa 3, Par 1540, and Par 2494 present detectable periodic and phase-locked variations. The periodic polarimetric variations are noisier and of a lesser amplitude (~0.1%) than for other types of binaries, such as hot stars. This could be due to stochastic events that produce deviations in the average polarization, a nonfavorable geometry (circumbinary envelope), or the nature of the scatterers (dust grains are less efficient polarizers than electrons). Par 1540 is a weak-line T Tauri star but nonetheless has enough dust in its environment to produce detectable levels of polarization and variations. A fourth interesting case is W134, which displays rapid changes in polarization that could be due to eclipses. We compare the observations with some of our numerical simulations and also show that an analysis of the periodic polarimetric variations with the Brown, McLean, & Emslie (BME) formalism to find the orbital inclination is for the moment premature: nonperiodic events introduce stochastic noise that partially masks the periodic low-amplitude variations and prevents the BME formalism from finding a reasonable estimate of the orbital inclination.

  13. Variational Bayesian strategies for high-dimensional, stochastic design problems

    NASA Astrophysics Data System (ADS)

    Koutsourelakis, P. S.

    2016-03-01

    This paper is concerned with a lesser-studied problem in the context of model-based, uncertainty quantification (UQ), that of optimization/design/control under uncertainty. The solution of such problems is hindered not only by the usual difficulties encountered in UQ tasks (e.g. the high computational cost of each forward simulation, the large number of random variables) but also by the need to solve a nonlinear optimization problem involving large numbers of design variables and potentially constraints. We propose a framework that is suitable for a class of such problems and is based on the idea of recasting them as probabilistic inference tasks. To that end, we propose a Variational Bayesian (VB) formulation and an iterative VB-Expectation-Maximization scheme that is capable of identifying a local maximum as well as a low-dimensional set of directions in the design space, along which, the objective exhibits the largest sensitivity. We demonstrate the validity of the proposed approach in the context of two numerical examples involving thousands of random and design variables. In all cases considered the cost of the computations in terms of calls to the forward model was of the order of 100 or less. The accuracy of the approximations provided is assessed by information-theoretic metrics.

  14. cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate.

    PubMed

    Klambauer, Günter; Schwarzbauer, Karin; Mayr, Andreas; Clevert, Djork-Arné; Mitterecker, Andreas; Bodenhofer, Ulrich; Hochreiter, Sepp

    2012-05-01

    Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

  15. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGESBeta

    Leung, Elo; Huang, Amy; Cadag, Eithon; Montana, Aldrin; Soliman, Jan Lorenz; Zhou, Carol L. Ecale

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  16. High Levels of Sample-to-Sample Variation Confound Data Analysis for Non-Invasive Prenatal Screening of Fetal Microdeletions.

    PubMed

    Chu, Tianjiao; Yeniterzi, Suveyda; Yatsenko, Svetlana A; Dunkel, Mary; Shaw, Patricia A; Bunce, Kimberly D; Peters, David G

    2016-01-01

    Our goal was to test the hypothesis that inter-individual genomic copy number variation in control samples is a confounding factor in the non-invasive prenatal detection of fetal microdeletions via the sequence-based analysis of maternal plasma DNA. The database of genomic variants (DGV) was used to determine the "Genomic Variants Frequency" (GVF) for each 50kb region in the human genome. Whole genome sequencing of fifteen karyotypically normal maternal plasma and six CVS DNA controls samples was performed. The coefficient of variation of relative read counts (cv.RTC) for these samples was determined for each 50kb region. Maternal plasma from two pregnancies affected with a chromosome 5p microdeletion was also sequenced, and analyzed using the GCREM algorithm. We found strong correlation between high variance in read counts and GVF amongst controls. Consequently we were unable to confirm the presence of the microdeletion via sequencing of maternal plasma samples obtained from two sequential affected pregnancies. Caution should be exercised when performing NIPT for microdeletions. It is vital to develop our understanding of the factors that impact the sensitivity and specificity of these approaches. In particular, benign copy number variation amongst controls is a major confounder, and their effects should be corrected bioinformatically. PMID:27249650

  17. High Levels of Sample-to-Sample Variation Confound Data Analysis for Non-Invasive Prenatal Screening of Fetal Microdeletions

    PubMed Central

    Chu, Tianjiao; Yeniterzi, Suveyda; Yatsenko, Svetlana A.; Dunkel, Mary; Shaw, Patricia A.; Bunce, Kimberly D.; Peters, David G.

    2016-01-01

    Our goal was to test the hypothesis that inter-individual genomic copy number variation in control samples is a confounding factor in the non-invasive prenatal detection of fetal microdeletions via the sequence-based analysis of maternal plasma DNA. The database of genomic variants (DGV) was used to determine the “Genomic Variants Frequency” (GVF) for each 50kb region in the human genome. Whole genome sequencing of fifteen karyotypically normal maternal plasma and six CVS DNA controls samples was performed. The coefficient of variation of relative read counts (cv.RTC) for these samples was determined for each 50kb region. Maternal plasma from two pregnancies affected with a chromosome 5p microdeletion was also sequenced, and analyzed using the GCREM algorithm. We found strong correlation between high variance in read counts and GVF amongst controls. Consequently we were unable to confirm the presence of the microdeletion via sequencing of maternal plasma samples obtained from two sequential affected pregnancies. Caution should be exercised when performing NIPT for microdeletions. It is vital to develop our understanding of the factors that impact the sensitivity and specificity of these approaches. In particular, benign copy number variation amongst controls is a major confounder, and their effects should be corrected bioinformatically. PMID:27249650

  18. BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data

    PubMed Central

    Giollo, Manuel; Minervini, Giovanni; Scalzotto, Marta; Leonardi, Emanuela; Ferrari, Carlo; Tosatto, Silvio C. E.

    2015-01-01

    Over the last decade, we have witnessed an incredible growth in the amount of available genotype data due to high throughput sequencing (HTS) techniques. This information may be used to predict phenotypes of medical relevance, and pave the way towards personalized medicine. Blood phenotypes (e.g. ABO and Rh) are a purely genetic trait that has been extensively studied for decades, with currently over thirty known blood groups. Given the public availability of blood group data, it is of interest to predict these phenotypes from HTS data which may translate into more accurate blood typing in clinical practice. Here we propose BOOGIE, a fast predictor for the inference of blood groups from single nucleotide variant (SNV) databases. We focus on the prediction of thirty blood groups ranging from the well known ABO and Rh, to the less studied Junior or Diego. BOOGIE correctly predicted the blood group with 94% accuracy for the Personal Genome Project whole genome profiles where good quality SNV annotation was available. Additionally, our tool produces a high quality haplotype phase, which is of interest in the context of ethnicity-specific polymorphisms or traits. The versatility and simplicity of the analysis make it easily interpretable and allow easy extension of the protocol towards other phenotypes. BOOGIE can be downloaded from URL http://protein.bio.unipd.it/download/. PMID:25893845

  19. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  20. Haplogroup Classification of Korean Cattle Breeds Based on Sequence Variations of mtDNA Control Region.

    PubMed

    Kim, Jae-Hwan; Lee, Seong-Su; Kim, Seung Chang; Choi, Seong-Bok; Kim, Su-Hyun; Lee, Chang Woo; Jung, Kyoung-Sub; Kim, Eun Sung; Choi, Young-Sun; Kim, Sung-Bok; Kim, Woo Hyun; Cho, Chang-Yeon

    2016-05-01

    Many studies have reported the frequency and distribution of haplogroups among various cattle breeds for verification of their origins and genetic diversity. In this study, 318 complete sequences of the mtDNA control region from four Korean cattle breeds were used for haplogroup classification. 71 polymorphic sites and 66 haplotypes were found in these sequences. Consistent with the genetic patterns in previous reports, four haplogroups (T1, T2, T3, and T4) were identified in Korean cattle breeds. In addition, T1a, T3a, and T3b sub-haplogroups were classified. In the phylogenetic tree, each haplogroup formed an independent cluster. The frequencies of T3, T4, T1 (containing T1a), and T2 were 66%, 16%, 10%, and 8%, respectively. Especially, the T1 haplogroup contained only one haplotype and a sample. All four haplogroups were found in Chikso, Jeju black and Hanwoo. However, only the T3 and T4 haplogroups appeared in Heugu, and most Chikso populations showed a partial of four haplogroups. These results will be useful for stable conservation and efficient management of Korean cattle breeds. PMID:26954229

  1. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data

    PubMed Central

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  2. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences.

  3. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  4. High-resolution transcriptome analysis with long-read RNA sequencing.

    PubMed

    Cho, Hyunghoon; Davis, Joe; Li, Xin; Smith, Kevin S; Battle, Alexis; Montgomery, Stephen B

    2014-01-01

    RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.

  5. Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide

    PubMed Central

    Lindahl, Björn D; Nilsson, R Henrik; Tedersoo, Leho; Abarenkov, Kessy; Carlsen, Tor; Kjøller, Rasmus; Kõljalg, Urmas; Pennanen, Taina; Rosendahl, Søren; Stenlid, Jan; Kauserud, Håvard

    2013-01-01

    Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs. Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions. Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification. Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions. PMID:23534863

  6. Fungal community analysis by high-throughput sequencing of amplified markers--a user's guide.

    PubMed

    Lindahl, Björn D; Nilsson, R Henrik; Tedersoo, Leho; Abarenkov, Kessy; Carlsen, Tor; Kjøller, Rasmus; Kõljalg, Urmas; Pennanen, Taina; Rosendahl, Søren; Stenlid, Jan; Kauserud, Håvard

    2013-07-01

    Novel high-throughput sequencing methods outperform earlier approaches in terms of resolution and magnitude. They enable identification and relative quantification of community members and offer new insights into fungal community ecology. These methods are currently taking over as the primary tool to assess fungal communities of plant-associated endophytes, pathogens, and mycorrhizal symbionts, as well as free-living saprotrophs. Taking advantage of the collective experience of six research groups, we here review the different stages involved in fungal community analysis, from field sampling via laboratory procedures to bioinformatics and data interpretation. We discuss potential pitfalls, alternatives, and solutions. Highlighted topics are challenges involved in: obtaining representative DNA/RNA samples and replicates that encompass the targeted variation in community composition, selection of marker regions and primers, options for amplification and multiplexing, handling of sequencing errors, and taxonomic identification. Without awareness of methodological biases, limitations of markers, and bioinformatics challenges, large-scale sequencing projects risk yielding artificial results and misleading conclusions. PMID:23534863

  7. High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing

    PubMed Central

    Cho, Hyunghoon; Davis, Joe; Li, Xin; Smith, Kevin S.; Battle, Alexis; Montgomery, Stephen B.

    2014-01-01

    RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals. PMID:25251678

  8. Next-Gen phylogeography of rainforest trees: exploring landscape-level cpDNA variation from whole-genome sequencing.

    PubMed

    van der Merwe, M; McPherson, H; Siow, J; Rossetto, M

    2014-01-01

    Standardized phylogeographic studies across codistributed taxa can identify important refugia and biogeographic barriers, and potentially uncover how changes in adaptive constraints through space and time impact on the distribution of genetic diversity. The combination of next-generation sequencing and methodologies that enable uncomplicated analysis of the full chloroplast genome may provide an invaluable resource for such studies. Here, we assess the potential of a shotgun-based method across twelve nonmodel rainforest trees sampled from two evolutionary distinct regions. Whole genomic shotgun sequencing libraries consisting of pooled individuals were used to assemble species-specific chloroplast references (in silicio). For each species, the pooled libraries allowed for the detection of variation within and between data sets (each representing a geographic region). The potential use of nuclear rDNA as an additional marker from the NGS libraries was investigated by mapping reads against available references. We successfully obtained phylogeographically informative sequence data from a range of previously unstudied rainforest trees. Greater levels of diversity were found in northern refugial rainforests than in southern expansion areas. The genetic signatures of varying evolutionary histories were detected, and interesting associative patterns between functional characteristics and genetic diversity were identified. This approach can suit a wide range of landscape-level studies. As the key laboratory-based steps do not require prior species-specific knowledge and can be easily outsourced, the techniques described here are even suitable for researchers without access to wet-laboratory facilities, making evolutionary ecology questions increasingly accessible to the research community.

  9. DNA sequence and haplotype variation in two candidate genes for dilated cardiomyopathy in the turkey Meleagris gallopavo.

    PubMed

    Lin, Kuan-chin; Xu, Jun; Kamara, Davida; Geng, Tuoyu; Gyenai, Kwaku; Reed, Kent M; Smith, Edward J

    2007-05-01

    Determining variation in genes is fundamental to understanding their function in the disease state. Cardiac troponin T (cTnT) and phospholamban (PLN) genes have been implicated in dilated cardiomyopathy (DCM) in human and model species. To investigate the role of these 2 candidate genes in DCM in the turkey Meleagris gallopavo, understanding sequence variants and map position distribution is necessary. To this end, a total of 1854 and 1771 bp of cTnT and PLN gene sequences, respectively, were scanned for single nucleotide polymorphisms (SNPs) in a randomly bred population. A total of 15 SNPs was identified in the cTnT and PLN genomic sequences. Nine haplotypes, 5 in cTnT and 4 in PLN, were identified. Observed heterozygosities (0.02-0.39) in the turkey population were low for both genes. Within each gene, 1 SNP corresponding to a restriction enzyme site was identified and used to develop a PCR-restriction fragment length polymorphism (RFLP) genotyping assay. The PLN gene was genetically mapped to turkey chromosome 2, equivalent to Gallus gallus chromosome 3, and cTnT mapped to a turkey microchromosome. Although limited because of the relatively small sample size of 55 birds, the data from this SNP analysis of PLN and cTnT provide a foundation from which to evaluate the function of cTnT and PLN in the turkey. Information about the distribution of the SNPs and haplotypes will facilitate future association and linkage studies.

  10. The thermostable direct hemolysin-related hemolysin (trh) gene of Vibrio parahaemolyticus: Sequence variation and implications for detection and function.

    PubMed

    Nilsson, William B; Turner, Jeffrey W

    2016-07-01

    Vibrio parahaemolyticus is a leading cause of bacterial food-related illness associated with the consumption of undercooked seafood. Only a small subset of strains is pathogenic. Most clinical strains encode for the thermostable direct hemolysin (TDH) and/or the TDH-related hemolysin (TRH). In this work, we amplify and sequence the trh gene from over 80 trh+strains of this bacterium and identify thirteen genetically distinct alleles, most of which have not been deposited in GenBank previously. Sequence data was used to design new primers for more reliable detection of trh by endpoint PCR. We also designed a new quantitative PCR assay to target a more conserved gene that is genetically-linked to trh. This gene, ureR, encodes the transcriptional regulator for the urease gene cluster immediately upstream of trh. We propose that this ureR assay can be a useful screening tool as a surrogate for direct detection of trh that circumvents challenges associated with trh sequence variation.

  11. Improved detection of artifactual viral minority variants in high-throughput sequencing data.

    PubMed

    Welkers, Matthijs R A; Jonges, Marcel; Jeeninga, Rienk E; Koopmans, Marion P G; de Jong, Menno D

    2014-01-01

    High-throughput sequencing (HTS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HTS process were investigated by determining minority variant frequencies in an influenza A/WSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RT-PCR) amplification and HTS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RT-PCR amplified samples, indicating RT-PCR amplification artificially increased variation. Detailed analysis showed that artifactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RT-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HTS. The source code has been made available through Sourceforge (https://sourceforge.net/projects/mva-ngs). PMID:25657642

  12. CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data.

    PubMed

    Sinha, Rituparna; Samaddar, Sandip; De, Rajat K

    2015-01-01

    Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. PMID:26291322

  13. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes

    PubMed Central

    Tennessen, Jacob A.; Bigham, Abigail W.; O'Connor, Timothy D.; Fu, Wenqing; Kenny, Eimear E.; Gravel, Simon; McGee, Sean; Do, Ron; Liu, Xiaoming; Jun, Goo; Kang, Hyun Min; Jordan, Daniel; Leal, Suzanne M.; Gabriel, Stacey; Rieder, Mark J.; Abecasis, Goncalo; Altshuler, David; Nickerson, Deborah A.; Boerwinkle, Eric; Sunyaev, Shamil; Bustamante, Carlos D.; Bamshad, Michael J.; Akey, Joshua M.

    2013-01-01

    As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits. PMID:22604720

  14. Variations of the ISM Compactness Across the Main Sequence of Star Forming Galaxies: Observations and Simulations

    NASA Astrophysics Data System (ADS)

    Martínez-Galarza, J. R.; Smith, H. A.; Lanz, L.; Hayward, Christopher C.; Zezas, A.; Rosenthal, L.; Weiner, A.; Hung, C.; Ashby, M. L. N.; Groves, B.

    2016-01-01

    The majority of star-forming galaxies follow a simple empirical correlation in the star formation rate (SFR) versus stellar mass (M*) plane, of the form {{SFR}}\\propto {M}*α , usually referred to as the star formation main sequence (MS). The physics that sets the properties of the MS is currently a subject of debate, and no consensus has been reached regarding the fundamental difference between members of the sequence and its outliers. Here we combine a set of hydro-dynamical simulations of interacting galactic disks with state-of-the-art radiative transfer codes to analyze how the evolution of mergers is reflected upon the properties of the MS. We present Chiburst, a Markov Chain Monte Carlo spectral energy distribution (SED) code that fits the multi-wavelength, broad-band photometry of galaxies and derives stellar masses, SFRs, and geometrical properties of the dust distribution. We apply this tool to the SEDs of simulated mergers and compare the derived results with the reference output from the simulations. Our results indicate that changes in the SEDs of mergers as they approach coalescence and depart from the MS are related to an evolution of dust geometry in scales larger than a few hundred parsecs. This is reflected in a correlation between the specific star formation rate, and the compactness parameter { C }, that parametrizes this geometry and hence the evolution of dust temperature ({T}{{dust}}) with time. As mergers approach coalescence, they depart from the MS and increase their compactness, which implies that moderate outliers of the MS are consistent with late-type mergers. By further applying our method to real observations of luminous infrared galaxies (LIRGs), we show that the merger scenario is unable to explain these extreme outliers of the MS. Only by significantly increasing the gas fraction in the simulations are we able to reproduce the SEDs of LIRGs.

  15. Weather explains high annual variation in butterfly dispersal.

    PubMed

    Kuussaari, Mikko; Rytteri, Susu; Heikkinen, Risto K; Heliölä, Janne; von Bagh, Peter

    2016-07-27

    Weather conditions fundamentally affect the activity of short-lived insects. Annual variation in weather is therefore likely to be an important determinant of their between-year variation in dispersal, but conclusive empirical studies are lacking. We studied whether the annual variation of dispersal can be explained by the flight season's weather conditions in a Clouded Apollo (Parnassius mnemosyne) metapopulation. This metapopulation was monitored using the mark-release-recapture method for 12 years. Dispersal was quantified for each monitoring year using three complementary measures: emigration rate (fraction of individuals moving between habitat patches), average residence time in the natal patch, and average distance moved. There was much variation both in dispersal and average weather conditions among the years. Weather variables significantly affected the three measures of dispersal and together with adjusting variables explained 79-91% of the variation observed in dispersal. Different weather variables became selected in the models explaining variation in three dispersal measures apparently because of the notable intercorrelations. In general, dispersal rate increased with increasing temperature, solar radiation, proportion of especially warm days, and butterfly density, and decreased with increasing cloudiness, rainfall, and wind speed. These results help to understand and model annually varying dispersal dynamics of species affected by global warming.

  16. Weather explains high annual variation in butterfly dispersal.

    PubMed

    Kuussaari, Mikko; Rytteri, Susu; Heikkinen, Risto K; Heliölä, Janne; von Bagh, Peter

    2016-07-27

    Weather conditions fundamentally affect the activity of short-lived insects. Annual variation in weather is therefore likely to be an important determinant of their between-year variation in dispersal, but conclusive empirical studies are lacking. We studied whether the annual variation of dispersal can be explained by the flight season's weather conditions in a Clouded Apollo (Parnassius mnemosyne) metapopulation. This metapopulation was monitored using the mark-release-recapture method for 12 years. Dispersal was quantified for each monitoring year using three complementary measures: emigration rate (fraction of individuals moving between habitat patches), average residence time in the natal patch, and average distance moved. There was much variation both in dispersal and average weather conditions among the years. Weather variables significantly affected the three measures of dispersal and together with adjusting variables explained 79-91% of the variation observed in dispersal. Different weather variables became selected in the models explaining variation in three dispersal measures apparently because of the notable intercorrelations. In general, dispersal rate increased with increasing temperature, solar radiation, proportion of especially warm days, and butterfly density, and decreased with increasing cloudiness, rainfall, and wind speed. These results help to understand and model annually varying dispersal dynamics of species affected by global warming. PMID:27440662

  17. Evaluation of a pooled strategy for high-throughput sequencing of cosmid clones from metagenomic libraries.

    PubMed

    Lam, Kathy N; Hall, Michael W; Engel, Katja; Vey, Gregory; Cheng, Jiujun; Neufeld, Josh D; Charles, Trevor C

    2014-01-01

    High-throughput sequencing methods have been instrumental in the growing field of metagenomics, with technological improvements enabling greater throughput at decreased costs. Nonetheless, the economy of high-throughput sequencing cannot be fully leveraged in the subdiscipline of functional metagenomics. In this area of research, environmental DNA is typically cloned to generate large-insert libraries from which individual clones are isolated, based on specific activities of interest. Sequence data are required for complete characterization of such clones, but the sequencing of a large set of clones requires individual barcode-based sample preparation; this can become costly, as the cost of clone barcoding scales linearly with the number of clones processed, and thus sequencing a large number of metagenomic clones often remains cost-prohibitive. We investigated a hybrid Sanger/Illumina pooled sequencing strategy that omits barcoding altogether, and we evaluated this strategy by comparing the pooled sequencing results to reference sequence data obtained from traditional barcode-based sequencing of the same set of clones. Using identity and coverage metrics in our evaluation, we show that pooled sequencing can generate high-quality sequence data, without producing problematic chimeras. Though caveats of a pooled strategy exist and further optimization of the method is required to improve recovery of complete clone sequences and to avoid circumstances that generate unrecoverable clone sequences, our results demonstrate that pooled sequencing represents an effective and low-cost alternative for sequencing large sets of metagenomic clones. PMID:24911009

  18. A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies

    PubMed Central

    2010-01-01

    Background High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. Results DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. Conclusions This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives

  19. Next-generation sequencing analysis of off-ladder alleles due to migration shift caused by sequence variation at D12S391 locus.

    PubMed

    Fujii, Koji; Watahiki, Haruhiko; Mita, Yusuke; Iwashima, Yasuki; Miyaguchi, Hajime; Kitayama, Tetsushi; Nakahara, Hiroaki; Mizuno, Natsuko; Sekiguchi, Kazumasa

    2016-09-01

    In short tandem repeat (STR) analysis, length polymorphisms are detected by capillary electrophoresis (CE). At most STR loci, mobility shift due to sequence variation in the repeat region was thought not to affect the typing results. In our recent population studies of 1501 Japanese individuals, off-ladder calls were observed at the D12S391 locus using PowerPlex Fusion in nine samples for allele 22, one sample for allele 25, and one sample for allele 26. However, these samples were typed as ordinary alleles within the bins using GlobalFiler. In this study, next-generation sequencing analysis using MiSeq was performed for the D12S391 locus from the 11 off-ladder samples and 33 other samples, as well as the allelic ladders of PowerPlex Fusion and GlobalFiler. All off-ladder allele 22 in the nine samples had [AGAT]11[AGAC]11 as a repeat structure, while the corresponding allele was [AGAT]15[AGAC]6[AGAT] for the PowerPlex Fusion ladder, and [AGAT]13[AGAC]9 for the GlobalFiler ladder. Overall, as the number of [AGAT] in the repeat structure decreased at the D12S391 locus, the peak migrated more slowly using PowerPlex Fusion, the reverse strand of which was labeled, and it migrated more rapidly using GlobalFiler, the forward strand of which was labeled. The allelic ladders of both STR kits were reamplified with our small amplicon D12S391 primers and their mobility was also examined. In conclusion, off-ladder observations of allele 22 at the D12S391 locus using PowerPlex Fusion were mainly attributed to a relatively large difference of the repeat structure between its allelic ladder and off-ladder allele 22. PMID:27591542

  20. Phylogeny, Floral Evolution, and Inter-Island Dispersal in Hawaiian Clermontia (Campanulaceae) Based on ISSR Variation and Plastid Spacer Sequences

    PubMed Central

    Givnish, Thomas J.; Bean, Gregory J.; Ames, Mercedes; Lyon, Stephanie P.; Sytsma, Kenneth J.

    2013-01-01

    Previous studies based on DNA restriction-site and sequence variation have shown that the Hawaiian lobeliads are monophyletic and that the two largest genera, Cyanea and Clermontia, diverged from each other ca. 9.7 Mya. Sequence divergence among species of Clermontia is quite limited, however, and extensive hybridization is suspected, which has interfered with production of a well-resolved molecular phylogeny for the genus. Clermontia is of considerable interest because several species posses petal-like sepals, raising the question of whether such a homeotic mutation has arisen once or several times. In addition, morphological and molecular studies have implied different patterns of inter-island dispersal within the genus. Here we use nuclear ISSRs (inter-simple sequence repeat polymorphisms) and five plastid non-coding sequences to derive biparental and maternal phylogenies for Clermontia. Our findings imply that (1) Clermontia is not monophyletic, with Cl. pyrularia nested within Cyanea and apparently an intergeneric hybrid; (2) the earliest divergent clades within Clermontia are native to Kauài, then Òahu, then Maui, supporting the progression rule of dispersal down the chain toward progressively younger islands, although that rule is violated in later-evolving taxa in the ISSR tree; (3) almost no sequence divergence among several Clermontia species in 4.5 kb of rapidly evolving plastid DNA; (4) several apparent cases of hybridization/introgression or incomplete lineage sorting (i.e., Cl. oblongifolia, peleana, persicifolia, pyrularia, samuelii, tuberculata), based on extensive conflict between the ISSR and plastid phylogenies; and (5) two origins and two losses of petaloid sepals, or—perhaps more plausibly—a single origin and two losses of this homeotic mutation, with its introgression into Cl. persicifolia. Our phylogenies are better resolved and geographically more informative than others based on ITS and 5S-NTS sequences and nuclear SNPs, but agree

  1. Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation

    PubMed Central

    Kolodziejczyk, Aleksandra A.; Kim, Jong Kyoung; Tsang, Jason C.H.; Ilicic, Tomislav; Henriksson, Johan; Natarajan, Kedar N.; Tuck, Alex C.; Gao, Xuefei; Bühler, Marc; Liu, Pentao; Marioni, John C.; Teichmann, Sarah A.

    2015-01-01

    Summary Embryonic stem cell (ESC) culture conditions are important for maintaining long-term self-renewal, and they influence cellular pluripotency state. Here, we report single cell RNA-sequencing of mESCs cultured in three different conditions: serum, 2i, and the alternative ground state a2i. We find that the cellular transcriptomes of cells grown in these conditions are distinct, with 2i being the most similar to blastocyst cells and including a subpopulation resembling the two-cell embryo state. Overall levels of intercellular gene expression heterogeneity are comparable across the three conditions. However, this masks variable expression of pluripotency genes in serum cells and homogeneous expression in 2i and a2i cells. Additionally, genes related to the cell cycle are more variably expressed in the 2i and a2i conditions. Mining of our dataset for correlations in gene expression allowed us to identify additional components of the pluripotency network, including Ptma and Zfp640, illustrating its value as a resource for future discovery. PMID:26431182

  2. High-throughput sequencing characterizes intertidal meiofaunal communities in northern Gulf of Mexico (Dauphin Island and Mobile Bay, Alabama).

    PubMed

    Brannock, Pamela M; Waits, Damien S; Sharma, Jyotsna; Halanych, Kenneth M

    2014-10-01

    Meiofauna are important components of food webs and for nutrient exchange between the benthos and water column. Recent studies have focused on these communities in the Gulf of Mexico due to potential impacts of the Deepwater Horizon Oil Spill (DWHOS). In particular, intertidal meiofaunal communities from Mobile Bay and Dauphin Island, Alabama, were previously shown to shift from predominately metazoan taxa prior to DWHOS to a fungal-dominated community after the spill. However, knowledge of variability within these communities remains unknown. Herein, we used Illumina high-throughput amplicon sequencing to examine variation throughout a year for the same locations for which the organismal shift was noted. Sediment samples were collected bi-monthly for a year (July 2011-July 2012) from which the meiofaunal community was examined by sequencing the eukaryotic hypervariable V9 region of the 18S rRNA gene. Results showed that the presence of fungal taxa was limited within these communities, suggesting that previously reported acute impacts of the DWHOS on meiofauna were apparently short term. However, these meiofaunal communities show shifts in proportions of metazoan taxa compared to pre-spill samples. Whether this change is due to prolonged impacts of the spill or variation in community composition is unclear. Taxonomic variation within and between sampled locations throughout the study was observed, suggesting potential yearly variation in communities. Continued sampling over a longer timeframe will provide a more complete understanding of seasonality and variation within these communities. Such a baseline is required to assess future anthropogenic impacts.

  3. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  4. Next-generation-sequencing of recurrent childhood high hyperdiploid acute lymphoblastic leukemia reveals mutations typically associated with high risk patients.

    PubMed

    Chen, Cai; Bartenhagen, Christoph; Gombert, Michael; Okpanyi, Vera; Binder, Vera; Röttgers, Silja; Bradtke, Jutta; Teigler-Schlegel, Andrea; Harbott, Jochen; Ginzel, Sebastian; Thiele, Ralf; Husemann, Peter; Krell, Pina F I; Borkhardt, Arndt; Dugas, Martin; Hu, Jianda; Fischer, Ute

    2015-09-01

    20% of children suffering from high hyperdiploid acute lymphoblastic leukemia develop recurrent disease. The molecular mechanisms are largely unknown. Here, we analyzed the genetic landscape of five patients at relapse, who developed recurrent disease without prior high-risk indication using whole-exome- and whole-genome-sequencing. Oncogenic mutations of RAS pathway genes (NRAS, KRAS, FLT3, n=4) and deactivating mutations of major epigenetic regulators (CREBBP, EP300, each n=2 and ARID4B, EZH2, MACROD2, MLL2, each n=1) were prominent in these cases and virtually absent in non-recurrent cases (n=6) or other pediatric acute lymphoblastic leukemia cases (n=18). In relapse nucleotide variations were detected in cell fate determining transcription factors (GLIS1, AKNA). Structural genomic alterations affected genes regulating B-cell development (IKZF1, PBX1, RUNX1). Eleven novel translocations involved the genes ART4, C12orf60, MACROD2, TBL1XR1, LRRN4, KIAA1467, and ELMO1/MIR1200. Typically, patients harbored only single structural variations, except for one patient who displayed massive rearrangements in the context of a germline tumor suppressor TP53 mutation and a Li-Fraumeni syndrome-like family history. Another patient harbored a germline mutation in the DNA repair factor ATM. In summary, the relapse patients of our cohort were characterized by somatic mutations affecting the RAS pathway, epigenetic and developmental programs and germline mutations in DNA repair pathways. PMID:26189108

  5. Construction of High-Density Genetic Map in Barley through Restriction-Site Associated DNA Sequencing.

    PubMed

    Zhou, Gaofeng; Zhang, Qisen; Zhang, Xiao-Qi; Tan, Cong; Li, Chengdao

    2015-01-01

    Genetic maps in barley are usually constructed from a limited number of molecular markers such as SSR (simple sequence repeat) and DarT (diversity arrays technology). These markers must be first developed before being used for genotyping. Here, we introduce a new strategy based on sequencing progeny of a doubled haploid population from Baudin × AC Metcalfe to construct a genetic map in barley. About 13,547 polymorphic SNP tags with >93% calling rate were selected to construct the genetic map. A total of 12,998 SNP tags were anchored to seven linkage groups which spanned a cumulative 967.6 cM genetic distance. The high-density genetic map can be used for QTL mapping and the assembly of WGS and BAC contigs. The genetic map was evaluated for its effectiveness and efficiency in QTL mapping and candidate gene identification. A major QTL for plant height was mapped at 105.5 cM on chromosome 3H. This QTL with LOD value of 13.01 explained 44.5% of phenotypic variation. This strategy will enable rapid and efficient establishment of high-density genetic maps in other species. PMID:26182149

  6. Analysis of X chromosome genomic DNA sequence copy number variation associated with premature ovarian failure (POF)

    PubMed Central

    Quilter, C.R.; Karcanias, A.C.; Bagga, M.R.; Duncan, S.; Murray, A.; Conway, G.S.; Sargent, C.A.; Affara, N.A.

    2013-01-01

    BACKGROUND Premature ovarian failure (POF) is a heterogeneous disease defined as amenorrhoea for >6 months before age 40, with an FSH serum level >40 mIU/ml (menopausal levels). While there is a strong genetic association with POF, familial studies have also indicated that idiopathic POF may also be genetically linked. Conventional cytogenetic analyses have identified regions of the X chromosome that are strongly associated with ovarian function, as well as several POF candidate genes. Cryptic chromosome abnormalities that have been missed might be detected by array comparative genomic hybridization. METHODS In this study, samples from 42 idiopathic POF patients were subjected to a complete end-to-end X/Y chromosome tiling path array to achieve a detailed copy number variation (CNV) analysis of X chromosome involvement in POF. The arrays also contained a 1 Mb autosomal tiling path as a reference control. Quantitative PCR for selected genes contained within the CNVs was used to confirm the majority of the changes detected. The expression pattern of some of these genes in human tissue RNA was examined by reverse transcription (RT)–PCR. RESULTS A number of CNVs were identified on both Xp and Xq, with several being shared among the POF cases. Some CNVs fall within known polymorphic CNV regions, and others span previously identified POF candidate regions and genes. CONCLUSIONS The new data reported in this study reveal further discrete X chromosome intervals not previously associated with the disease and therefore implicate new clusters of candidate genes. Further studies will be required to elucidate their involvement in POF. PMID:20570974

  7. Integration of high-resolution seismic with core data delineates sequence stratigraphy of a shelf-edge delta complex

    SciTech Connect

    Combes, J.M.; Nissen, S.E.; Scott, R.W.

    1995-12-31

    Correlation of high resolution seismic and corehole data sets obtained offshore Louisiana by a cooperative consortium of Louisiana State University and ten petroleum industry partners has resulted in a detailed sequence stratigraphic interpretation of a Late Pleistocene shelf margin delta system. High resolution a Late Pleistocene shelf margin delta system. High resolution stratal geometries have been interpreted within this framework of genetically related facies and key sequence surfaces have been identified both on the high resolution seismic lines and in the core data. Regional expressions of chronostratigraphically identified sequence-bounding unconformities and transgressive ravinement surfaces emphasize the importance of these surfaces in determining stratigraphic relationships. Several key conclusions resulted from this study: (1) The optimum location for interpretation of sequence surfaces is within or near the locus of maximum deposition. (2) At a distance from a depocenter the characteristic features of sequence surfaces lose seismic resolution and minor, subtle variations in the reflection character are the only seismic indicators of major boundaries. (3) Shelf edge deltaic deposits are known to contain important hydrocarbon reservoirs and this latest Pleistocene system provides an excellent model for older Cenozoic systems. (4) Potential deep sea fan reservoirs may accumulate seaward of shelf margin deltas during both falling and rising sea level stages depending upon local sedimentological conditions.

  8. Contamination-controlled high-throughput whole genome sequencing for influenza A viruses using the MiSeq sequencer

    PubMed Central

    Lee, Hong Kai; Lee, Chun Kiat; Tang, Julian Wei-Tze; Loh, Tze Ping; Koay, Evelyn Siew-Chuan

    2016-01-01

    Accurate full-length genomic sequences are important for viral phylogenetic studies. We developed a targeted high-throughput whole genome sequencing (HT-WGS) method for influenza A viruses, which utilized an enzymatic cleavage-based approach, the Nextera XT DNA library preparation kit, for library preparation. The entire library preparation workflow was adapted for the Sentosa SX101, a liquid handling platform, to automate this labor-intensive step. As the enzymatic cleavage-based approach generates low coverage reads at both ends of the cleaved products, we corrected this loss of sequencing coverage at the termini by introducing modified primers during the targeted amplification step to generate full-length influenza A sequences with even coverage across the whole genome. Another challenge of targeted HTS is the risk of specimen-to-specimen cross-contamination during the library preparation step that results in the calling of false-positive minority variants. We included an in-run, negative system control to capture contamination reads that may be generated during the liquid handling procedures. The upper limits of 99.99% prediction intervals of the contamination rate were adopted as cut-off values of contamination reads. Here, 148 influenza A/H3N2 samples were sequenced using the HTS protocol and were compared against a Sanger-based sequencing method. Our data showed that the rate of specimen-to-specimen cross-contamination was highly significant in HTS. PMID:27624998

  9. Contamination-controlled high-throughput whole genome sequencing for influenza A viruses using the MiSeq sequencer.

    PubMed

    Lee, Hong Kai; Lee, Chun Kiat; Tang, Julian Wei-Tze; Loh, Tze Ping; Koay, Evelyn Siew-Chuan

    2016-01-01

    Accurate full-length genomic sequences are important for viral phylogenetic studies. We developed a targeted high-throughput whole genome sequencing (HT-WGS) method for influenza A viruses, which utilized an enzymatic cleavage-based approach, the Nextera XT DNA library preparation kit, for library preparation. The entire library preparation workflow was adapted for the Sentosa SX101, a liquid handling platform, to automate this labor-intensive step. As the enzymatic cleavage-based approach generates low coverage reads at both ends of the cleaved products, we corrected this loss of sequencing coverage at the termini by introducing modified primers during the targeted amplification step to generate full-length influenza A sequences with even coverage across the whole genome. Another challenge of targeted HTS is the risk of specimen-to-specimen cross-contamination during the library preparation step that results in the calling of false-positive minority variants. We included an in-run, negative system control to capture contamination reads that may be generated during the liquid handling procedures. The upper limits of 99.99% prediction intervals of the contamination rate were adopted as cut-off values of contamination reads. Here, 148 influenza A/H3N2 samples were sequenced using the HTS protocol and were compared against a Sanger-based sequencing method. Our data showed that the rate of specimen-to-specimen cross-contamination was highly significant in HTS. PMID:27624998

  10. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    PubMed Central

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  11. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges.

    PubMed

    Liu, Biao; Morrison, Carl D; Johnson, Candace S; Trump, Donald L; Qin, Maochun; Conroy, Jeffrey C; Wang, Jianmin; Liu, Song

    2013-11-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections.

  12. The intramolecular impact to the sequence specificity of B-->A transition: low energy conformational variations in AA/TT and GG/CC steps.

    PubMed

    Il'icheva, I A; Vlasov, P K; Esipova, N G; Tumanyan, V G

    2010-04-01

    It is well known, that local B--> A transformation in DNA is involved in several biological processes. In vitro B<--> A transition is sequence-specific. The physical basis of this specificity is not known yet. Here we analyze the effect of intramolecular interactions on the structural behavior of the GG/CC and AA/TT steps. These steps exemplify sequence specific bias to the B- or A-form structure. Optimization of potential energy of the molecular systems composed of an octanucleotide, neutralized by Na(+) and solvated with TIP3P water molecules in rectangular box with periodic boundary conditions gives the statistically representative sets of low energy structures for GG/CC and AA/TT steps in the middle of the diverse flanking sequences. Permissible 3D variations of GG/CC and AA/TT, and correlation of the relative motion of base pairs in these steps were analyzed. AA/TT step permits high variability for low energy conformers in the B-form DNA and small variability for low energy conformers in the A-form DNA. In contrast GG/CC step permits high variability for low energy conformers in the A-form DNA and small variability for low energy conformers in the B-form DNA. The relative motion of base pairs in GG/CC step is high correlated, while in AA/TT step this correlation is notably less. Atom-atom interactions inside-the-step always favors the B-form and their component - stacking interactions (atom-atom interactions between nucleic bases) is crucial for the duplex stabilization. Formation of the A-form for both steps is a result of interactions with the flanking sequences and water-cation environment in the box. The average energy difference between conformations presenting B-form and A-form for the GG/CC step is high, while for the AA/TT step it is rather low. Thus, intramolecular interactions in GG/CC and AA/TT steps affect the possible conformational diversity ("conformational entropy") of the A- and B- type structures of DNA step. This determines the known bias of

  13. The Intramolecular Impact to the Sequence Specificity of B→A Transition: Low Energy Conformational Variations in AA/TT and GG/CC Steps.

    PubMed

    Il'icheva, I A; Vlasov, P K; Esipova, N G; Tumanyan, V G

    2010-04-01

    Abstract It is well known, that local B→A transformation in DNA is involved in several biological processes. In vitro B↔A transition is sequence-specific. The physical basis of this specificity is not known yet. Here we analyze the effect of intramolecular interactions on the structural behavior of the GG/CC and AA/TT steps. These steps exemplify sequence specific bias to the B- or A-form structure. Optimization of potential energy of the molecular systems composed of an octanucle-otide, neutralized by Na(+) and solvated with TIP3P water molecules in rectangular box with periodic boundary conditions gives the statistically representative sets of low energy structures for GG/CC and AA/TT steps in the middle of the diverse flanking sequences. Permissible 3D variations of GG/CC and AA/TT, and correlation of the relative motion of base pairs in these steps were analyzed. AA/TT step permits high variability for low energy conformers in the B-form DNA and small variability for low energy conformers in the A-form DNA. In contrast GG/CC step permits high variability for low energy conformers in the A-form DNA and small variability for low energy conformers in the B-form DNA. The relative motion of base pairs in GG/CC step is high correlated, while in AA/TT step this correlation is notably less. Atom-atom interactions inside-the-step always favors the B-form and their component - stacking interactions (atomatom interactions between nucleic bases) is crucial for the duplex stabilization. Formation of the A-form for both steps is a result of interactions with the flanking sequences and water-cation environment in the box. The average energy difference between conformations presenting B-form and A-form for the GG/CC step is high, while for the AA/TT step it is rather low. Thus, intramolecular interactions in GG/CC and AA/TT steps affect the possible conformational diversity ("conformational entropy") of the A- and B- type structures of DNA step. This determines the known

  14. A High-Definition View of Functional Genetic Variation from Natural Yeast Genomes

    PubMed Central

    Bergström, Anders; Simpson, Jared T.; Salinas, Francisco; Barré, Benjamin; Parts, Leopold; Zia, Amin; Nguyen Ba, Alex N.; Moses, Alan M.; Louis, Edward J.; Mustonen, Ville; Warringer, Jonas; Durbin, Richard; Liti, Gianni

    2014-01-01

    The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies. PMID:24425782

  15. Bats aloft: Variation in echolocation call structure at high altitudes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bats alter their echolocation calls in response to changes in ecological and behavioral conditions, but little is known about how they adjust their call structure in response to changes in altitude. This study examines altitudinal variation in the echolocation calls of Brazilian free-tailed bats, T...

  16. High-throughput, high-fidelity HLA genotyping with deep sequencing.

    PubMed

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F; Levinson, Douglas; Fernandez-Viña, Marcelo A; Davis, Ronald W; Davis, Mark M; Mindrinos, Michael

    2012-05-29

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes.

  17. High-throughput, high-fidelity HLA genotyping with deep sequencing

    PubMed Central

    Wang, Chunlin; Krishnakumar, Sujatha; Wilhelmy, Julie; Babrzadeh, Farbod; Stepanyan, Lilit; Su, Laura F.; Levinson, Douglas; Fernandez-Viña, Marcelo A.; Davis, Ronald W.; Davis, Mark M.; Mindrinos, Michael

    2012-01-01

    Human leukocyte antigen (HLA) genes are the most polymorphic in the human genome. They play a pivotal role in the immune response and have been implicated in numerous human pathologies, especially autoimmunity and infectious diseases. Despite their importance, however, they are rarely characterized comprehensively because of the prohibitive cost of standard technologies and the technical challenges of accurately discriminating between these highly related genes and their many allelles. Here we demonstrate a high-resolution, and cost-effective methodology to type HLA genes by sequencing, which combines the advantage of long-range amplification, the power of high-throughput sequencing platforms, and a unique genotyping algorithm. We calibrated our method for HLA-A, -B, -C, and -DRB1 genes with both reference cell lines and clinical samples and identified several previously undescribed alleles with mismatches, insertions, and deletions. We have further demonstrated the utility of this method in a clinical setting by typing five clinical samples in an Illumina MiSeq instrument with a 5-d turnaround. Overall, this technology has the capacity to deliver low-cost, high-throughput, and accurate HLA typing by multiplexing thousands of samples in a single sequencing run, which will enable comprehensive disease-association studies with large cohorts. Furthermore, this approach can also be extended to include other polymorphic genes. PMID:22589303

  18. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

    PubMed Central

    2014-01-01

    Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the

  19. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines.

    PubMed

    Shukla, Anjali; Singh, V K; Bharadwaj, D R; Kumar, Rajesh; Rai, Ashutosh; Rai, A K; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P S

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  20. Spatial thickness variation of Carboniferous coal-bearing sequences: A sedimentological response to varying levels of compactional and structural control

    SciTech Connect

    Liu, Yuejin; Ferm, J.C. . Dept. of Geological Sciences)

    1992-01-01

    A study of 1,120 borehole records from Carboniferous coal-bearing rocks in a 160 square mile area in southeastern Kentucky shows that within a stratigraphic interval of about 2,000 feet, the major lithic components are coarsening-upward sequences and coal groups. The latter are groups of rocks averaging 120 feet in thickness which include coal, thin mudstone and sandstone of channel or splay origin. The coarsening-upward sequences, which consist of mudstone overlain by sandstone, are of two types, one that is thick (mean thickness 170 feet/52 m) and very widespread and the other thin (mean thickness 70 feet/21 m) and has only local distribution. Variogram and trend surface procedures were used to characterize the dimension and areal distribution of these rock bodies. The results show that thickness variation is a product of differential compaction and movement of deep seated structures contemporaneous with sedimentation. Structural effects on two scales can be recognized, one on the order of 6 to 10 miles and other greater than 20 miles. Differential compaction effects are found to be closely associated with those produced by the smaller scale structures while some of the large scale structure effects are concordant with present Alleghenian structures.

  1. De Novo Assembly of Bitter Gourd Transcriptomes: Gene Expression and Sequence Variations in Gynoecious and Monoecious Lines

    PubMed Central

    Shukla, Anjali; Singh, V. K.; Bharadwaj, D. R.; Kumar, Rajesh; Rai, Ashutosh; Rai, A. K.; Mugasimangalam, Raja; Parameswaran, Sriram; Singh, Major; Naik, P. S.

    2015-01-01

    Bitter gourd (Momordica charantia L.) is a nutritious vegetable crop of Asian origin, used as a medicinal herb in Indian and Chinese traditional medicine. Molecular breeding in bitter gourd is in its infancy, due to limited molecular resources, particularly on functional markers for traits such as gynoecy. We performed de novo transcriptome sequencing of bitter gourd using Illumina next-generation sequencer, from root, flower buds, stem and leaf samples of gynoecious line (Gy323) and a monoecious line (DRAR1). A total of 65,540 transcripts for Gy323 and 61,490 for DRAR1 were obtained. Comparisons revealed SNP and SSR variations between these lines and, identification of gene classes. Based on available transcripts we identified 80 WRKY transcription factors, several reported in responses to biotic and abiotic stresses; 56 ARF genes which play a pivotal role in auxin-regulated gene expression and development. The data presented will be useful in both functions studies and breeding programs in bitter gourd. PMID:26047102

  2. Phylogeny and chromosomal variations in East Asian Carex, Siderostictae group (Cyperaceae), based on DNA sequences and cytological data.

    PubMed

    Yano, Okihito; Ikeda, Hiroshi; Jin, Xiao-Feng; Hoshino, Takuji

    2014-01-01

    Carex (Cyperaceae) is one of the largest genera of the flowering plants, and comprises more than 2,000 species. In Carex, section Siderostictae with broader leaves distributed in East Asia is thought to be an ancestral group. We aimed to clarify the phylogenetic relationships and chromosomal variations within the section Siderostictae, and to examine the relationship of broad-leaved species of the sections Hemiscaposae and Surculosae from East Asia, inferred from DNA sequences and cytological data. Our results indicate that a monophyletic Siderostictae clade, including the sections Hemiscaposae, Siderostictae and Surculosae, as the earliest diverging group in the tribe Cariceae. Low chromosome numbers, 2n = 12 or 24, with large sizes were observed in these three sections. Our results suggest that the genus Carex might have originated or relictly restricted in the East Asia. Geographical distributions of diploid species are restricted in narrower areas, while those of tetraploid species are wider in East Asia. It is concluded that chromosomal variations in Siderostictae clade may have been caused by polyploidization and that tetraploid species may have been able to exploit their habitats by polyploidization.

  3. Variation in high-frequency wave radiation from small repeating earthquakes as revealed by cross-spectral analysis

    NASA Astrophysics Data System (ADS)

    Hatakeyama, Norishige; Uchida, Naoki; Matsuzawa, Toru; Okada, Tomomi; Nakajima, Junichi; Matsushima, Takeshi; Kono, Toshio; Hirahara, Satoshi; Nakayama, Takashi

    2016-11-01

    We examined the variation in the high-frequency wave radiation for three repeating earthquake sequences (M = 3.1-4.1) in the northeastern Japan subduction zone by waveform analyses. Earthquakes in each repeating sequence are located at almost the same place and show low-angle thrust type focal mechanisms, indicating that they represent repeated ruptures of a seismic patch on the plate boundary. We calculated cross-spectra of the waveforms and obtained the phases and coherences for pairs of events in the respective repeating sequences in order to investigate the waveform differences. We used waveform data sampled at 1 kHz that were obtained from temporary seismic observations we conducted immediately after the 2011 Tohoku earthquake near the source area. For two repeating sequences, we found that the interevent delay times for the two waveforms in a frequency band higher than the corner frequencies are different from those in a lower frequency band for particular event pairs. The phases and coherences show that there are coherent high-frequency waves for almost all the repeaters regardless of the high-frequency delays. These results indicate that high-frequency waves are always radiated from the same vicinity (subpatch) for these events but the time intervals between the ruptures of the subpatch and the centroid times can vary. We classified events in the sequence into two subgroups according to the high-frequency band interevent delays relative to the low-frequency band. For one sequence, we found that all the events that occurred just after (within 11 days) larger nearby earthquakes belong to one subgroup while other events belong to the other subgroup. This suggests that the high-frequency wave differences were caused by stress perturbations due to the nearby earthquakes. In summary, our observations suggest that high-frequency waves from the repeating sequence are radiated not from everywhere but from a long-duration subpatch within the seismic slip area. The

  4. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  5. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing. PMID:27377322

  6. Next-Gen phylogeography of rainforest trees: exploring landscape-level cpDNA variation from whole-genome sequencing.

    PubMed

    van der Merwe, M; McPherson, H; Siow, J; Rossetto, M

    2014-01-01

    Standardized phylogeographic studies across codistributed taxa can identify important refugia and biogeographic barriers, and potentially uncover how changes in adaptive constraints through space and time impact on the distribution of genetic diversity. The combination of next-generation sequencing and methodologies that enable uncomplicated analysis of the full chloroplast genome may provide an invaluable resource for such studies. Here, we assess the potential of a shotgun-based method across twelve nonmodel rainforest trees sampled from two evolutionary distinct regions. Whole genomic shotgun sequencing libraries consisting of pooled individuals were used to assemble species-specific chloroplast references (in silicio). For each species, the pooled libraries allowed for the detection of variation within and between data sets (each representing a geographic region). The potential use of nuclear rDNA as an additional marker from the NGS libraries was investigated by mapping reads against available references. We successfully obtained phylogeographically informative sequence data from a range of previously unstudied rainforest trees. Greater levels of diversity were found in northern refugial rainforests than in southern expansion areas. The genetic signatures of varying evolutionary histories were detected, and interesting associative patterns between functional characteristics and genetic diversity were identified. This approach can suit a wide range of landscape-level studies. As the key laboratory-based steps do not require prior species-specific knowledge and can be easily outsourced, the techniques described here are even suitable for researchers without access to wet-laboratory facilities, making evolutionary ecology questions increasingly accessible to the research community. PMID:24119022

  7. High resolution sequence stratigraphic analysis of the Late Miocene Abu Madi Formation, Northern Nile Delta Basin

    NASA Astrophysics Data System (ADS)

    Sarhan, Mohammad Abdelfattah

    2015-12-01

    Abu Madi Formation represents the Upper Miocene Messinian age in the Nile Delta basin. It consists mainly of sandstones and shale intercalations and because of its richness in hydrocarbon, it has been subdivided by the petroleum companies into Level-I, Level-II and Level-III, respectively according to the increase in the sandstone to the shale ratio. The Miocene cycle in the northern subsurface section of the Nile Delta encompasses three main formations namely from the base; Sidi Salim formation, Qawasim Formation and Abu Madi Formation at the top. The high resolution sequence stratigraphic analysis, using gamma ray responses, has been done for the Late Miocene formation in the northern part of the Nile delta subsurface section. For this purpose, the gamma-ray logs of ten deep wells, arranged in four cross-sections trending in almost north-south direction throughout the northern region of the Nile Delta, were analyzed. The analysis has revealed that the interpreted 4th order depositional cycles within Abu Madi Formation display great variations in both number and gamma ray responses in each investigated well, and cannot be traced laterally, even in the nearest well. These variations in the interpreted 4th order depositional sequences could be attributed to the presence of normal faults buried in the inter-area laying between the investigated wells. This finding matches with the conclusion of that Abu Madi Formation represents a part of the Upper Miocene Nile Delta syn-rift megasequence, developed during the Upper Miocene rift phase of the Red Sea - Gulf of Suez province in Egypt. Accordingly, in the sequence stratigraphic approach, the depositional history of Abu Madi Formation was strongly overprinted by the tectonic controls rather than the relative sea-level changes which are assumed to be of a secondary influence. Regarding the hydrocarbon aspects of the Abu Madi Formation, the present work recommends to direct the drilling efforts into the stratigraphic traps

  8. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches

    PubMed Central

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J.; Sommer, Simone

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele’s amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  9. PCR Strategies for Complete Allele Calling in Multigene Families Using High-Throughput Sequencing Approaches.

    PubMed

    Marmesat, Elena; Soriano, Laura; Mazzoni, Camila J; Sommer, Simone; Godoy, José A

    2016-01-01

    The characterization of multigene families with high copy number variation is often approached through PCR amplification with highly degenerate primers to account for all expected variants flanking the region of interest. Such an approach often introduces PCR biases that result in an unbalanced representation of targets in high-throughput sequencing libraries that eventually results in incomplete detection of the targeted alleles. Here we confirm this result and propose two different amplification strategies to alleviate this problem. The first strategy (called pooled-PCRs) targets different subsets of alleles in multiple independent PCRs using different moderately degenerate primer pairs, whereas the second approach (called pooled-primers) uses a custom-made pool of non-degenerate primers in a single PCR. We compare their performance to the common use of a single PCR with highly degenerate primers using the MHC class I of the Iberian lynx as a model. We found both novel approaches to work similarly well and better than the conventional approach. They significantly scored more alleles per individual (11.33 ± 1.38 and 11.72 ± 0.89 vs 7.94 ± 1.95), yielded more complete allelic profiles (96.28 ± 8.46 and 99.50 ± 2.12 vs 63.76 ± 15.43), and revealed more alleles at a population level (13 vs 12). Finally, we could link each allele's amplification efficiency with the primer-mismatches in its flanking sequences and show that ultra-deep coverage offered by high-throughput technologies does not fully compensate for such biases, especially as real alleles may reach lower coverage than artefacts. Adopting either of the proposed amplification methods provides the opportunity to attain more complete allelic profiles at lower coverages, improving confidence over the downstream analyses and subsequent applications. PMID:27294261

  10. Genetic variation and population differentiation in a medical herb Houttuynia cordata in China revealed by inter-simple sequence repeats (ISSRs).

    PubMed

    Wei, Lin; Wu, Xian-Jin

    2012-01-01

    Houttuynia cordata is an important traditional Chinese herb with unresolved genetics and taxonomy, which lead to potential problems in the conservation and utilization of the resource. Inter-simple sequence repeat (ISSR) markers were used to assess the level and distribution of genetic diversity in 226 individuals from 15 populations of H. cordata in China. ISSR analysis revealed low genetic variations within populations but high genetic differentiations among populations. This genetic structure probably mainly reflects the historical association among populations. Genetic cluster analysis showed that the basal clade is composed of populations from Southwest China, and the other populations have continuous and eastward distributions. The structure of genetic diversity in H. cordata demonstrated that this species might have survived in Southwest China during the glacial age, and subsequently experienced an eastern postglacial expansion. Based on the results of genetic analysis, it was proposed that as many as possible targeted populations for conservation be included. PMID:22942696

  11. Spatial stress variations in the aftershock sequence following the 2008 M6 earthquake doublet in the South Iceland Seismic Zone

    NASA Astrophysics Data System (ADS)

    Hensch, M.; Árnadóttir, Th.; Lund, B.; Brandsdóttir, B.

    2012-04-01

    The South Iceland Seismic Zone (SISZ) is an approximately 80 km wide E-W transform zone, bridging the offset between the Eastern Volcanic Zone and the Hengill triple junction to the west. The plate motion is accommodated in the brittle crust by faulting on many N-S trending right-lateral strike-slip faults of 2-5 km separation. Major sequences of large earthquakes (M>6) has occurred repeatedly in the SISZ since the settlement in Iceland more than thousand years ago. On 29th May 2008, two M6 earthquakes hit the western part of the SISZ on two adjacent N-S faults within a few seconds. The intense aftershock sequence was recorded by the permanent Icelandic SIL network and a promptly installed temporary network of 11 portable seismometers in the source region. The network located thousands of aftershocks during the following days, illuminating a 12-17 km long region along both major fault ruptures as well as several smaller parallel faults along a diffuse E-W trending region west of the mainshock area without any preceding main rupture. This episode is suggested to be the continuation of an earthquake sequence which started with two M6.5 and several M5-6 events in June 2000. The time delay between the 2000 and 2008 events could be due to an inflation episode in Hengill during 1993-1998, that potentially locked N-S strike slip faults in the western part of the SISZ. Around 300 focal solutions for aftershocks have been derived by analyzing P-wave polarities, showing predominantly strike-slip movements with occasional normal faulting components (unstable P-axis direction), which suggests an extensional stress regime as their driving force. A subsequent stress inversion of four different aftershock clusters reveals slight variations of the directions of the average σ3 axes. While for both southern clusters, including the E-W cluster, the σ3 axes are rather elongated perpendicular to the overall plate spreading axis, they are more northerly trending for shallower clusters

  12. Putting Physics First: Three Case Studies of High School Science Department and Course Sequence Reorganization

    ERIC Educational Resources Information Center

    Larkin, Douglas B.

    2016-01-01

    This article examines the process of shifting to a "Physics First" sequence in science course offerings in three school districts in the United States. This curricular sequence reverses the more common U.S. high school sequence of biology/chemistry/physics, and has gained substantial support in the physics education community over the…

  13. High-Quality Draft Genome Sequence of Bacillus subtilis Strain WAUSV36

    PubMed Central

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Bacillus subtilis strain WAUSV36 inhibits the growth of and decreases disease symptoms caused by the potato pathogen Phytophthora infestans. We determined the sequence of the 4.7-Mbp genome of this strain. WAUSV36 shared very high nucleotide sequence identity with previously sequenced strains of B. subtilis. PMID:27340068

  14. Genetic structure of the widespread and common Mediterranean bryophyte Pleurochaete squarrosa (Brid.) Lindb. (Pottiaceae) - evidence from nuclear and plastidic DNA sequence variation and allozymes.

    PubMed

    Grundmann, Michael; Ansell, Stephen W; Russell, Stephen J; Koch, Marcus A; Vogel, Johannes C

    2007-02-01

    The Mediterranean Basin as one the world's most biologically diverse regions provides an interesting area for the study of plant evolution and spatial structure in plant populations. The dioecious moss Pleurochaete squarrosa is a widespread and common bryophyte in the Mediterranean Basin. Thirty populations were sampled for a study on molecular diversity and genetic structure, covering most major islands and mainland populations from Europe and Africa. A significant decline in nuclear and chloroplast sequence and allozyme variation within populations from west to east was observed. While DNA sequence data showed patterns of isolation by distance, allozyme markers did not. Instead, their considerable interpopulation genetic differentiation appeared to be unrelated to geographic distance. Similar high values for coefficients of gene diversity (G(ST)) in all data sets provided evidence of geographic isolation and limited gene flow among populations (i) within islands, (ii) within mainland areas, and (iii) between islands and mainland. Notably, populations in continental Spain are strongly genetically isolated from all other investigated areas. Surprisingly, there was no difference in gene diversity and G(ST) between islands and mainland areas. Thus, we conclude that large Mediterranean islands may function as 'mainland' for bryophytes. This hypothesis and its implication for conservation biology of cryptogamic plants warrant further investigation. While sexually reproducing populations were found all over the Mediterranean Basin, high levels of multilocus linkage disequilibrium provide evidence of mainly vegetative propagation even in populations where sexual reproduction was observed. PMID:17284206

  15. Palaeomagnetism and 40Ar/39Ar age of a Pliocene lava flow sequence in the Lesser Caucasus: record of a clockwise rotation and analysis of palaeosecular variation

    NASA Astrophysics Data System (ADS)

    Caccavari, Ana; Calvo-Rathert, Manuel; Goguitchaichvili, Avto; Huaiyu, He; Vashakidze, Goga; Vegas, Néstor

    2014-06-01

    A palaeomagnetic and rock-magnetic investigation has been carried out on a Pliocene lava flow sequence in the Djavakheti Highland in the central Lesser Caucasus in the Republic of Georgia. In addition, a 40Ar/39Ar dating and electronic microscopic studies were performed on samples of this sequence, named the Saro section, which consists of 39 successive lava flows of doleritic basalts. A characteristic magnetization could be isolated in all studied 39 flows, yielding reverse-polarity directions in all cases, a mean direction D = 202.2°, I = -60.6° (N = 39, α95 = 2.0°, k = 138) being obtained. Thermomagnetic experiments (strong-field versus temperature curves) suggested low-Ti titanomagnetites and low Curie-temperature titanomagnetites with a rather high titanium content (x ≈ 0.5-0.7) as the main carriers of remanence. Their domain structure is characterized by a mixture of single- and multidomain grains. 40Ar/39Ar dating yielded an age of 1.73 ± 0.03 Ma, interpreted as the eruption age of the uppermost lava flow of the sequence. Analysis of palaeomagnetic results and radiometric data from the present and a previous study allows two different explanations about the time of emplacement of the section: (i) The lower 36 flows of the sequence might have been emitted between the normal-polarity Reunion and Olduvai chrons, and the upper three flows after the Olduvai chron, with a long hiatus in volcanic activity of more than 150 kyr or (ii) The whole sequence has been emitted between 1.778 and 1.73 ± 0.03 Ma, after the Olduvai chron. Comparison of the palaeomagnetic results obtained in this study with the expected direction shows that while inclination values agree well, declination shows an eastward deviation of 19.2° ± 5.8°. This discrepancy can be explained with a clockwise vertical-axis rotation of the sequence, which might have been produced by extensional structures with strike-slip component, which can be found in the study area. Virtual geomagnetic pole

  16. Population genomics of Pacific lamprey: adaptive variation in a highly dispersive species.

    PubMed

    Hess, Jon E; Campbell, Nathan R; Close, David A; Docker, Margaret F; Narum, Shawn R

    2013-06-01

    Unlike most anadromous fishes that have evolved strict homing behaviour, Pacific lamprey (Entosphenus tridentatus) seem to lack philopatry as evidenced by minimal population structure across the species range. Yet unexplained findings of within-region population genetic heterogeneity coupled with the morphological and behavioural diversity described for the species suggest that adaptive genetic variation underlying fitness traits may be responsible. We employed restriction site-associated DNA sequencing to genotype 4439 quality filtered single nucleotide polymorphism (SNP) loci for 518 individuals collected across a broad geographical area including British Columbia, Washington, Oregon and California. A subset of putatively neutral markers (N = 4068) identified a significant amount of variation among three broad populations: northern British Columbia, Columbia River/southern coast and 'dwarf' adults (F(CT) = 0.02, P ≪ 0.001). Additionally, 162 SNPs were identified as adaptive through outlier tests, and inclusion of these markers revealed a signal of adaptive variation related to geography and life history. The majority of the 162 adaptive SNPs were not independent and formed four groups of linked loci. Analyses with matsam software found that 42 of these outlier SNPs were significantly associated with geography, run timing and dwarf life history, and 27 of these 42 SNPs aligned with known genes or highly conserved genomic regions using the genome browser available for sea lamprey. This study provides both neutral and adaptive context for observed genetic divergence among collections and thus reconciles previous findings of population genetic heterogeneity within a species that displays extensive gene flow. PMID:23205767

  17. Mitochondrial DNA sequence variation and phylogeography of Neotropic pumas (Puma concolor).

    PubMed

    Caragiulo, Anthony; Dias-Freedman, Isabela; Clark, J Alan; Rabinowitz, Salisa; Amato, George

    2014-08-01

    Pumas occupy the largest latitudinal range of any New World terrestrial mammal. Human population growth and associated habitat reduction has reduced their North American range by nearly two-thirds, but the impact of human expansion in Central and South America on puma populations is not clear. We examined mitochondrial DNA diversity of pumas across the majority of their range, with a focus on Central and South America. Four mitochondrial gene regions (1140 base pairs) revealed 16 unique haplotypes differentiating pumas into three geographic groupings: North America, Central America and South America. These groups were highly differentiated as indicated by significant pairwise FST values. North American samples were genetically homogenous compared to Central and South American samples, and South American pumas were the most diverse and ancestral. These findings support an earlier hypothesis that North America was recolonized by founding pumas from Central and South America.

  18. Variation in sequences containing microsatellite motifs in the perennial biomass and forage grass, Phalaris arundinacea (Poaceae).

    PubMed

    Barth, Susanne; Jankowska, Marta Jolanta; Hodkinson, Trevor Roland; Vellani, Tia; Klaas, Manfred

    2016-03-22

    Forty three microsatellite markers were developed for further genetic characterisation of a forage and biomass grass crop, for which genomic resources are currently scarce. The microsatellite markers were developed from a normalized EST-SSR library. All of the 43 markers gave a clear banding pattern on 3% Metaphor agarose gels. Eight selected SSR markers were tested in detail for polymorphism across eleven DNA samples of large geographic distribution across Europe. The new set of 43 SSR markers will help future research to characterise the genetic structure and diversity of Phalaris arundinacea, with a potential to further understand its invasive character in North American wetlands, as well as aid in breeding work for desired biomass and forage traits. P. arundinacea is particularly valued in the northern latitude as a crop with high biomass potential, even more so on marginal lands.

  19. Performance and microbial ecology of a nitritation sequencing batch reactor treating high-strength ammonia wastewater

    PubMed Central

    Chen, Wenjing; Dai, Xiaohu; Cao, Dawen; Wang, Sha; Hu, Xiaona; Liu, Wenru; Yang, Dianhai

    2016-01-01

    The partial nitrification (PN) performance and the microbial community variations were evaluated in a sequencing batch reactor (SBR) for 172 days, with the stepwise elevation of ammonium concentration. Free ammonia (FA) and low dissolved oxygen inhibition of nitrite-oxidized bacteria (NOB) were used to achieve nitritation in the SBR. During the 172 days operation, the nitrogen loading rate of the SBR was finally raised to 3.6 kg N/m3/d corresponding the influent ammonium of 1500 mg/L, with the ammonium removal efficiency and nitrite accumulation rate were 94.12% and 83.54%, respectively, indicating that the syntrophic inhibition of FA and low dissolved oxygen contributed substantially to the stable nitrite accumulation. The results of the 16S rRNA high-throughput sequencing revealed that Nitrospira, the only nitrite-oxidizing bacteria in the system, were successively inhibited and eliminated, and the SBR reactor was dominated finally by Nitrosomonas, the ammonium-oxidizing bacteria, which had a relative abundance of 83%, indicating that the Nitrosomonas played the primary roles on the establishment and maintaining of nitritation. Followed by Nitrosomonas, Anaerolineae (7.02%) and Saprospira (1.86%) were the other mainly genera in the biomass. PMID:27762325

  20. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling

    PubMed Central

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-01-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage. PMID:25858951

  1. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling.

    PubMed

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-05-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage. PMID:25858951

  2. Optimizing sparse sequencing of single cells for highly multiplex copy number profiling.

    PubMed

    Baslan, Timour; Kendall, Jude; Ward, Brian; Cox, Hilary; Leotta, Anthony; Rodgers, Linda; Riggs, Michael; D'Italia, Sean; Sun, Guoli; Yong, Mao; Miskimen, Kristy; Gilmore, Hannah; Saborowski, Michael; Dimitrova, Nevenka; Krasnitz, Alexander; Harris, Lyndsay; Wigler, Michael; Hicks, James

    2015-05-01

    Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage.

  3. Assessing diversity of the female urine microbiota by high throughput sequencing of 16S rDNA amplicons

    PubMed Central

    2011-01-01

    Background Urine within the urinary tract is commonly regarded as "sterile" in cultivation terms. Here, we present a comprehensive in-depth study of bacterial 16S rDNA sequences associated with urine from healthy females by means of culture-independent high-throughput sequencing techniques. Results Sequencing of the V1V2 and V6 regions of the 16S ribosomal RNA gene using the 454 GS FLX system was performed to characterize the possible bacterial composition in 8 culture-negative (<100,000 CFU/ml) healthy female urine specimens. Sequences were compared to 16S rRNA databases and showed significant diversity, with the predominant genera detected being Lactobacillus, Prevotella and Gardnerella. The bacterial profiles in the female urine samples studied were complex; considerable variation between individuals was observed and a common microbial signature was not evident. Notably, a significant amount of sequences belonging to bacteria with a known pathogenic potential was observed. The number of operational taxonomic units (OTUs) for individual samples varied substantially and was in the range of 20 - 500. Conclusions Normal female urine displays a noticeable and variable bacterial 16S rDNA sequence richness, which includes fastidious and anaerobic bacteria previously shown to be associated with female urogenital pathology. PMID:22047020

  4. High natural gene expression variation in the reef-building coral Acropora millepora: potential for acclimative and adaptive plasticity

    PubMed Central

    2013-01-01

    Background Ecosystems worldwide are suffering the consequences of anthropogenic impact. The diverse ecosystem of coral reefs, for example, are globally threatened by increases in sea surface temperatures due to global warming. Studies to date have focused on determining genetic diversity, the sequence variability of genes in a species, as a proxy to estimate and predict the potential adaptive response of coral populations to environmental changes linked to climate changes. However, the examination of natural gene expression variation has received less attention. This variation has been implicated as an important factor in evolutionary processes, upon which natural selection can act. Results We acclimatized coral nubbins from six colonies of the reef-building coral Acropora millepora to a common garden in Heron Island (Great Barrier Reef, GBR) for a period of four weeks to remove any site-specific environmental effects on the physiology of the coral nubbins. By using a cDNA microarray platform, we detected a high level of gene expression variation, with 17% (488) of the unigenes differentially expressed across coral nubbins of the six colonies (jsFDR-corrected, p < 0.01). Among the main categories of biological processes found differentially expressed were transport, translation, response to stimulus, oxidation-reduction processes, and apoptosis. We found that the transcriptional profiles did not correspond to the genotype of the colony characterized using either an intron of the carbonic anhydrase gene or microsatellite loci markers. Conclusion Our results provide evidence of the high inter-colony variation in A. millepora at the transcriptomic level grown under a common garden and without a correspondence with genotypic identity. This finding brings to our attention the importance of taking into account natural variation between reef corals when assessing experimental gene expression differences. The high transcriptional variation detected in this study is

  5. Conservation and variation of nucleotide sequences within related bacterial genomes: enterobacteria.

    PubMed Central

    Riley, M; Anilionis, A

    1980-01-01

    We have assessed the degree of relatedness of several portions of the Escherichia coli genome to the corresponding portions of the genomes of representative enteric bacteria, using the Southern transfer and hybridization technique (E. Southern, J. Mol. Biol. 98:503-517, 1975). The degree of relatedness varied among the regions examined. Judging both by the relative amounts of deoxyribonucleic acid in the various enteric genomes that are highly homologous and by the conservation of positions of restriction enzyme cleavage sites in these regions, the enteric genomes have diverged to greater extents in some parts of the genomes than in others. Portions of the genomes (including the tnaA and thyA genes, the trp operon, and one other unassigned segment) appear to have evolved in concert with the genome as a whole. By contrast, the lacZ gene and portions of the genome that are homologous to phage lambda vary more widely, perhaps reflecting a separate evolutionary origin for these segments of deoxyribonucleic acid. Images PMID:6447143

  6. Sequence variation and gene duplication at MHC DQB loci of baiji (Lipotes vexillifer), a Chinese river dolphin.

    PubMed

    Yang, G; Yan, J; Zhou, K; Wei, F

    2005-01-01

    The major histocompatibility complex (MHC) is a fundamental part of the vertebrate immune system, and the high variability in many MHC genes is thought to play an important role in the recognition of parasites. Baiji (Lipotes vexillifer) is one of the most endangered species in the world. Its wild population has declined to fewer than 100 individuals and has a very high risk of becoming extinct in the near future. In this study we present a first step in the molecular characterization of a DQB-like locus of baiji by nucleotide sequence analysis of the polymorphic exon 2 segments. In the examined 172 bp sequences from a group of 18 incidentally captured or stranded individuals, 48 variable sites were determined and 43 alleles were identified, many of which were represented by only one clone. Three to seven alleles were found in each individual, suggesting gene duplications. No deletion, insertion, or exceptional stop codon was detected, suggesting these alleles function in vivo. Phylogenetic reconstruction using neighbor joining grouped the 43 alleles into two distinct lineages, differing by seven nucleotides and four amino acids. Substitutions of amino acids tend to be clustered around sites postulated to be responsible for selective peptide recognition. In the peptide-binding region (PBR) of the DQB locus, the average number of nonsynonymous substitutions per site is greater than that of synonymous substitutions per site (0.1962 versus 0.0256, respectively). Nucleotide and amino acid sequences both showed a relatively high level of similarity (nucleotides 90.6%; amino acids 80.6%) to those of beluga whale (Delphinapterus leucas) and narwhal (Monodon monoceros). The high level of baiji MHC polymorphism revealed in the present study has not been reported in other cetaceans and could be a consequence of the small baiji population adapting to freshwater with a relatively high level of pathogens. PMID:15843636

  7. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing

    PubMed Central

    Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  8. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing.

    PubMed

    Shafer, Aaron B A; Northrup, Joseph M; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B W

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations.

  9. Phylogeography of East Asian Lespedeza buergeri (Fabaceae) based on chloroplast and nuclear ribosomal DNA sequence variations.

    PubMed

    Jin, Dong-Pil; Lee, Jung-Hyun; Xu, Bo; Choi, Byoung-Hee

    2016-09-01

    The dynamic changes in land configuration during the Quaternary that were accompanied by climatic oscillations have significantly influenced the current distribution and genetic structure of warm-temperate forests in East Asia. Although recent surveys have been conducted, the historical migration of forest species via land bridges and, especially, the origins of Korean populations remains conjectural. Here, we reveal the genetic structure of Lespedeza buergeri, a warm-temperate shrub that is disjunctively distributed around the East China Sea (ECS) at China, Korea, and Japan. Two non-coding regions (rpl32-trnL, psbA-trnH) of chloroplast DNA (cpDNA) and the internal transcribed spacer of nuclear ribosomal DNA (nrITS) were analyzed for 188 individuals from 16 populations, which covered almost all of its distribution. The nrITS data demonstrated a genetic structure that followed geographic boundaries. This examination utilized AMOVA, comparisons of genetic differentiation based on haplotype frequency/genetic mutations among haplotypes, and Mantel tests. However, the cpDNA data showed contrasting genetic pattern, implying that this difference was due to a slower mutation rate in cpDNA than in nrITS. These results indicated frequent migration by this species via an ECS land bridge during the early Pleistocene that then tapered gradually toward the late Pleistocene. A genetic isolation between western and eastern Japan coincided with broad consensus that was suggested by the presence of other warm-temperate plants in that country. For Korean populations, high genetic diversity indicated the existence of refugia during the Last Glacial Maximum on the Korean Peninsula. However, their closeness with western Japanese populations at the level of haplotype clade implied that gene flow from western Japanese refugia was possible until post-glacial processing occurred through the Korea/Tsushima Strait land bridge. PMID:27206725

  10. Construction and Analysis of High-Density Linkage Map Using High-Throughput Sequencing Data

    PubMed Central

    Liu, Min; Liu, Hui; Zeng, Huaping; Deng, Dejing; Xin, Huaigen; Song, Jun; Xu, Chunhua; Sun, Xiaowen; Hou, Xilin; Wang, Xiaowu; Zheng, Hongkun

    2014-01-01

    Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/. PMID:24905985

  11. Meiofaunal community analysis by high-throughput sequencing: comparison of extraction, quality filtering, and clustering methods.

    PubMed

    Brannock, Pamela M; Halanych, Kenneth M

    2015-10-01

    Using molecular tools to examine community composition of meiofauna, animals 45μm to 1mm in size living between sediment grains in aquatic environments, is relatively new in comparison to bacterial and archaeal microbial studies. Although high-throughput molecular approaches are starting to be applied to these ccommunities, effectiveness of different approaches for nucleic acid extraction from meiofauna is poorly known and bioinformatic pipelines vary between studies. Given this situation, there is a need for protocols to be developed that promote consistency in sample collection and processing, sequence quality filtering, and Operational Taxonomic Unit (OTU) clustering methods. Herein, we assess different approaches used for DNA extraction (DNA extracted directly from sediment versus elutriated material retained on a 45μm sieve) as well as how different quality filtering methods of sequences and OTU clustering algorithms impact genetic assessment of meiofauna community composition. DNA extracted directly from sediment resulted in higher presence of non-metazoan eukaryotic taxa; in contrast, an elutriation (resuspension with decanting) approach increased meiofauna abundance and enriched metazoan OTUs. In regards to bioinformatics analyses, the number of overall OTUs varied by clustering algorithm, primarily due to the applied method of sequence quality filtering. However, alpha and beta diversity analyses showed similar trends regardless of bioinformatics pipeline utilized. Based on our results, we recommend studies of meiofauna communities first elutriate samples prior to DNA extraction and include multiple biological replicates to account for variation in community-level composition. The quality filtering method should be carefully considered as this step accounted for large discrepancy in the number of OTUs inferred.

  12. Novel genotype–phenotype associations demonstrated by high-throughput sequencing in patients with hypertrophic cardiomyopathy

    PubMed Central

    Lopes, Luis R; Syrris, Petros; Guttmann, Oliver P; O'Mahony, Constantinos; Tang, Hak Chiaw; Dalageorgou, Chrysoula; Jenkins, Sharon; Hubank, Mike; Monserrat, Lorenzo; McKenna, William J; Plagnol, Vincent; Elliott, Perry M

    2015-01-01

    Objective A predictable relation between genotype and disease expression is needed in order to use genetic testing for clinical decision-making in hypertrophic cardiomyopathy (HCM). The primary aims of this study were to examine the phenotypes associated with sarcomere protein (SP) gene mutations and test the hypothesis that variation in non-sarcomere genes modifies the phenotype. Methods Unrelated and consecutive patients were clinically evaluated and prospectively followed in a specialist clinic. High-throughput sequencing was used to analyse 41 genes implicated in inherited cardiac conditions. Variants in SP and non-SP genes were tested for associations with phenotype and survival. Results 874 patients (49.6±15.4 years, 67.8% men) were studied; likely disease-causing SP gene variants were detected in 383 (43.8%). Patients with SP variants were characterised by younger age and higher prevalence of family history of HCM, family history of sudden cardiac death, asymmetric septal hypertrophy, greater maximum LV wall thickness (all p values<0.0005) and an increased incidence of cardiovascular death (p=0.012). Similar associations were observed for individual SP genes. Patients with ANK2 variants had greater maximum wall thickness (p=0.0005). Associations at a lower level of significance were demonstrated with variation in other non-SP genes. Conclusions Patients with HCM caused by rare SP variants differ with respect to age at presentation, family history of the disease, morphology and survival from patients without SP variants. Novel associations for SP genes are reported and, for the first time, we demonstrate possible influence of variation in non-SP genes associated with other forms of cardiomyopathy and arrhythmia syndromes on the clinical phenotype of HCM. PMID:25351510

  13. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea.

    PubMed

    Goldsmith, Dawn B; Parsons, Rachel J; Beyene, Damitu; Salamon, Peter; Breitbart, Mya

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years.

  14. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea.

    PubMed

    Goldsmith, Dawn B; Parsons, Rachel J; Beyene, Damitu; Salamon, Peter; Breitbart, Mya

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  15. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    PubMed Central

    Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

    2015-01-01

    Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645

  16. A strategy to recover a high-quality, complete plastid sequence from low-coverage whole-genome sequencing1

    PubMed Central

    Garaycochea, Silvia; Speranza, Pablo; Alvarez-Valin, Fernando

    2015-01-01

    Premise of the study: We developed a bioinformatic strategy to recover and assemble a chloroplast genome using data derived from low-coverage 454 GS FLX/Roche whole-genome sequencing. Methods: A comparative genomics approach was applied to obtain the complete chloroplast genome from a weedy biotype of rice from Uruguay. We also applied appropriate filters to discriminate reads representing novel DNA transfer events between the chloroplast and nuclear genomes. Results: From a set of 295,159 reads (96 Mb data), we assembled the chloroplast genome into two contigs. This weedy rice was classified based on 23 polymorphic regions identified by comparison with reference chloroplast genomes. We detected recent and past events of genetic material transfer between the chloroplast and nuclear genomes and estimated their occurrence frequency. Discussion: We obtained a high-quality complete chloroplast genome sequence from low-coverage sequencing data. Intergenome DNA transfer appears to be more frequent than previously thought. PMID:26504677

  17. Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

    PubMed Central

    2012-01-01

    Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993

  18. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    PubMed Central

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  19. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers.

    PubMed

    Bessaud, Maël; Sadeuh-Mba, Serge A; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses. PMID:27617004

  20. Whole Genome Sequencing of Enterovirus species C Isolates by High-Throughput Sequencing: Development of Generic Primers

    PubMed Central

    Bessaud, Maël; Sadeuh-Mba, Serge A.; Joffret, Marie-Line; Razafindratsimandresy, Richter; Polston, Patsy; Volle, Romain; Rakoto-Andrianarivelo, Mala; Blondel, Bruno; Njouom, Richard; Delpeyroux, Francis

    2016-01-01

    Enteroviruses are among the most common viruses infecting humans and can cause diverse clinical syndromes ranging from minor febrile illness to severe and potentially fatal diseases. Enterovirus species C (EV-C) consists of more than 20 types, among which the three serotypes of polioviruses, the etiological agents of poliomyelitis, are included. Biodiversity and evolution of EV-C genomes are shaped by frequent recombination events. Therefore, identification and characterization of circulating EV-C strains require the sequencing of different genomic regions. A simple method was developed to quickly sequence the entire genome of EV-C isolates. Four overlapping fragments were produced separately by RT-PCR performed with generic primers. The four amplicons were then pooled and purified prior to being sequenced by a high-throughput technique. The method was assessed on a panel of EV-Cs belonging to a wide-range of types. It can be used to determine full-length genome sequences through de novo assembly of thousands of reads. It was also able to discriminate reads from closely related viruses in mixtures. By decreasing the workload compared to classical Sanger-based techniques, this method will serve as a precious tool for sequencing large panels of EV-Cs isolated in cell cultures during environmental surveillance or from patients, including vaccine-derived polioviruses.

  1. Tracking TCRβ Sequence Clonotype Expansions during Antiviral Therapy Using High-Throughput Sequencing of the Hypervariable Region

    PubMed Central

    Robinson, Mark W.; Hughes, Joseph; Wilkie, Gavin S.; Swann, Rachael; Barclay, Stephen T.; Mills, Peter R.; Patel, Arvind H.; Thomson, Emma C.; McLauchlan, John

    2016-01-01

    To maintain a persistent infection viruses such as hepatitis C virus (HCV) employ a range of mechanisms that subvert protective T cell responses. The suppression of antigen-specific T cell responses by HCV hinders efforts to profile T cell responses during chronic infection and antiviral therapy. Conventional methods of detecting antigen-specific T cells utilize either antigen stimulation (e.g., ELISpot, proliferation assays, cytokine production) or antigen-loaded tetramer staining. This limits the ability to profile T cell responses during chronic infection due to suppressed effector function and the requirement for prior knowledge of antigenic viral peptide sequences. Recently, high-throughput sequencing (HTS) technologies have been developed for the analysis of T cell repertoires. In the present study, we have assessed the feasibility of HTS of the TCRβ complementarity determining region (CDR)3 to track T cell expansions in an antigen-independent manner. Using sequential blood samples from HCV-infected individuals undergoing antiviral therapy, we were able to measure the population frequencies of >35,000 TCRβ sequence clonotypes in each individual over the course of 12 weeks. TRBV/TRBJ gene segment usage varied markedly between individuals but remained relatively constant within individuals across the course of therapy. Despite this stable TRBV/TRBJ gene segment usage, a number of TCRβ sequence clonotypes showed dramatic changes in read frequency. These changes could not be linked to therapy outcomes in the present study; however, the TCRβ CDR3 sequences with the largest fold changes did include sequences with identical TRBV/TRBJ gene segment usage and high junction region homology to previously published CDR3 sequences from HCV-specific T cells targeting the HLA-B*0801-restricted 1395HSKKKCDEL1403 and HLA-A*0101-restricted 1435ATDALMTGY1443 epitopes. The pipeline developed in this proof of concept study provides a platform for the design of future

  2. DNA-Sequence Variation Among Schistosoma mekongi Populations and Related Taxa; Phylogeography and the Current Distribution of Asian Schistosomiasis

    PubMed Central

    Attwood, Stephen W.; Fatih, Farrah A.; Upatham, E. Suchart

    2008-01-01

    Background Schistosomiasis in humans along the lower Mekong River has proven a persistent public health problem in the region. The causative agent is the parasite Schistosoma mekongi (Trematoda: Digenea). A new transmission focus is reported, as well as the first study of genetic variation among S. mekongi populations. The aim is to confirm the identity of the species involved at each known focus of Mekong schistosomiasis transmission, to examine historical relationships among the populations and related taxa, and to provide data for use (a priori) in further studies of the origins, radiation, and future dispersal capabilities of S. mekongi. Methodology/Principal Findings DNA sequence data are presented for four populations of S. mekongi from Cambodia and southern Laos, three of which were distinguishable at the COI (cox1) and 12S (rrnS) mitochondrial loci sampled. A phylogeny was estimated for these populations and the other members of the Schistosoma sinensium group. The study provides new DNA sequence data for three new populations and one new locus/population combination. A Bayesian approach is used to estimate divergence dates for events within the S. sinensium group and among the S. mekongi populations. Conclusions/Significance The date estimates are consistent with phylogeographical hypotheses describing a Pliocene radiation of the S. sinensium group and a mid-Pleistocene invasion of Southeast Asia by S. mekongi. The date estimates also provide Bayesian priors for future work on the evolution of S. mekongi. The public health implications of S. mekongi transmission outside the lower Mekong River are also discussed. PMID:18350111

  3. A high-density remote reference magnetic variation profile in the Pacific northwest of North America

    USGS Publications Warehouse

    Hermance, J.F.; Lusi, S.; Slocum, W.; Neumann, G.A.; Green, A.W.

    1989-01-01

    During the summer of 1985, as part of the EMSLAB Project, Brown University conducted a detailed magnetic variation study of the Oregon Coast Range and Cascades volcanic system along an E-W profile in central Oregon. Comprised of a sequence of 75 remote reference magnetic variation (MV) stations spaced 3-4 km apart, the profile stretched for 225 km from Newport, on the Oregon coast, across the Coast Range, the Willamette Valley, and the High Cascades to a point ??? 50 km east of Santiam Pass. At all of the MV stations, data were collected for short periods (16-100 s), and at 17 of these stations data were also obtained at longer periods (100-1600 s). Data were monitored with a three-component ring core fluxgate magnetometer (Nanotesla), and were recorded with a microcomputer (DEC PDP 11/73) based data acquisition system. A 2-D generalized inversion of the magnetic transfer coefficients over the period range of 16-1600 s indicates four distinct conductors. First, we see the coast effect caused by a large sedimentary wedge offshore. Second, we see the effect of currents flowing in the conductive sediments of the Willamette Valley. Our inversion suggests that the Willamette Valley consists of two electrically distinct features, due perhaps to a horst-like structure imprinted on the valley sediments. Next we note an electric current system centered beneath the High Cascades. This latter feature may be associated with a sediment-filled graben beneath Santiam Pass as suggested by some of the gravity and MT results reported to date. Finally, we detect the presence of a deep conductor at mid-crustal depths which laterally extends westward from beneath the Basin and Range Province, and terminates beneath the western Cascades. One view of this last result is that it appears that modern Basin and Range structure is being imprinted on pre-existing Cascade structure. ?? 1989.

  4. Heavy-light chain interrelations of MS-associated immunoglobulins probed by deep sequencing and rational variation.

    PubMed

    Lomakin, Yakov A; Zakharova, Maria Yu; Stepanov, Alexey V; Dronina, Maria A; Smirnov, Ivan V; Bobik, Tatyana V; Pyrkov, Andrey Yu; Tikunova, Nina V; Sharanova, Svetlana N; Boitsov, Vitali M; Vyazmin, Sergey Yu; Kabilov, Marsel R; Tupikin, Alexey E; Krasnov, Alexey N; Bykova, Nadezda A; Medvedeva, Yulia A; Fridman, Marina V; Favorov, Alexander V; Ponomarenko, Natalia A; Dubina, Michael V; Boyko, Alexey N; Vlassov, Valentin V; Belogurov, Alexey A; Gabibov, Alexander G

    2014-12-01

    The mechanisms triggering most of autoimmune diseases are still obscure. Autoreactive B cells play a crucial role in the development of such pathologies and, in particular, production of autoantibodies of different specificities. The combination of deep-sequencing technology with functional studies of antibodies selected from highly representative immunoglobulin combinatorial libraries may provide unique information on specific features in the repertoires of autoreactive B cells. Here, we have analyzed cross-combinations of the variable regions of human immunoglobulins against the myelin basic protein (MBP) previously selected from a multiple sclerosis (MS)-related scFv phage-display library. On the other hand, we have performed deep sequencing of the sublibraries of scFvs against MBP, Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1), and myelin oligodendrocyte glycoprotein (MOG). Bioinformatics analysis of sequencing data and surface plasmon resonance (SPR) studies have shown that it is the variable fragments of antibody heavy chains that mainly determine both the affinity of antibodies to the parent autoantigen and their cross-reactivity. It is suggested that LMP1-cross-reactive anti-myelin autoantibodies contain heavy chains encoded by certain germline gene segments, which may be a hallmark of the EBV-specific B cell subpopulation involved in MS triggering.

  5. Variations in IL-23 and IL-25 receptor gene structure, sequence and expression associated with the two disease forms of sheep paratuberculosis.

    PubMed

    Nicol, Louise; Gossner, Anton; Watkins, Craig; Chianini, Francesca; Dalziel, Robert; Hopkins, John

    2016-02-09

    The immunopathology of paucibacillary and multibacillary sheep paratuberculosis is characterized by inflammatory T cell and macrophage responses respectively. IL-23 and IL-25 are key to the development of these responses by interaction with their complex receptors, IL-23R/IL-12RB1 and IL-17RA/IL-17RB. In humans, variations in structure, sequence and/or expression of these genes have been implicated in the different pathological forms of tuberculosis and leprosy, and in gastrointestinal inflammatory disorders such as Crohn's disease. Sequencing has identified multiple transcript variants of sheep IL23R, IL12RB1 and IL17RB and a single IL17RA transcript. RT-qPCR assays were developed for all the identified variants and used to compare expression in the ileo-caecal lymph node of sheep with paucibacillary or multibacillary paratuberculosis and uninfected animals. With IL-23 receptor, only the IL12RB1v3 variant, which lacks the receptor activation motif was differentially expressed and was significantly increased in multibacillary disease; this may contribute to high Th2 responses. Of the IL17RB variants only full length IL17RB was differentially expressed and was significantly increased in multibacillary pathology; which may also contribute to Th2 polarization. IL17RA expression was significantly increased in paucibacillary disease. The contrast between the IL17RA and IL17RB results may indicate that, in addition to Th1 cells, Th17 T cells are also involved in paucibacillary pathology.

  6. Enhanced detection of DNA sequences using end-point PCR amplification and online gel electrophoresis (GE)-ICP-MS: determination of gene copy number variations.

    PubMed

    González, T Iglesias; Espina, M; Sierra, L M; Bettmer, J; Blanco-González, E; Montes-Bayón, M; Sanz-Medel, A

    2014-11-18

    The design and evaluation of analytical methods that permit quantitative analysis of specific DNA sequences is exponentially increasing. For this purpose, highly sensitive methodologies usually based on labeling protocols with fluorescent dyes or nanoparticles are often explored. Here, the possibility of label-free signal amplification using end-point polymerase chain reaction (PCR) are exploited using on-column agarose gel electrophoresis as separation and inductively coupled plasma-mass spectrometry (ICP-MS) for the detection of phosphorus in amplified DNA sequences. The calibration of the separation system with a DNA ladder permits direct estimation of the size of the amplified gene fragment after PCR. With this knowledge, and considering the compound-independent quantification capabilities exhibited by ICP-MS for phosphorus (it is only dependent on the number of P atoms per molecule), the correlation of the P-peak area of the amplified gene fragment, with respect to the gene copy numbers (in the starting DNA), is then established. Such a relationship would permit the determination of copy number variations (CNVs) in genomic DNA using ICP-MS measurements. The method detection limit, in terms of the required amount of starting DNA, is ∼6 ng (or 1000 cells if 100% extraction efficiency is expected). The suitability of the proposed label-free amplification strategy is applied to CNVs monitoring in cells exposed to a chemical agent capable of deletion induction, such as cisplatin. PMID:25312744

  7. RecA-binding pilE G4 sequence essential for pilin antigenic variation forms monomeric and 5' end-stacked dimeric parallel G-quadruplexes.

    PubMed

    Kuryavyi, Vitaly; Cahoon, Laty A; Seifert, H Steven; Patel, Dinshaw J

    2012-12-01

    Neisseria gonorrhoeae is an obligate human pathogen that can escape immune surveillance through antigenic variation of surface structures such as pili. A G-quadruplex-forming (G4) sequence (5'-G(3)TG(3)TTG(3)TG(3)) located upstream of the N. gonorrhoeae pilin expression locus (pilE) is necessary for initiation of pilin antigenic variation, a recombination-based, high-frequency, diversity-generation system. We have determined NMR-based structures of the all parallel-stranded monomeric and 5' end-stacked dimeric pilE G-quadruplexes in monovalent cation-containing solutions. We demonstrate that the three-layered all parallel-stranded monomeric pilE G-quadruplex containing single-residue double-chain reversal loops, which can be modeled without steric clashes into the 3 nt DNA-binding site of RecA, binds and promotes E. coli RecA-mediated strand exchange in vitro. We discuss how interactions between RecA and monomeric pilE G-quadruplex could facilitate the specialized recombination reactions leading to pilin diversification.

  8. Intragenomic sequence variation at the ITS1 - ITS2 region and at the 18S and 28S nuclear ribosomal DNA genes of the New Zealand mud snail, Potamopyrgus antipodarum (Hydrobiidae: mollusca)

    USGS Publications Warehouse

    Hoy, Marshal S.; Rodriguez, Rusty J.

    2013-01-01

    Molecular genetic analysis was conducted on two populations of the invasive non-native New Zealand mud snail (Potamopyrgus antipodarum), one from a freshwater ecosystem in Devil's Lake (Oregon, USA) and the other from an ecosystem of higher salinity in the Columbia River estuary (Hammond Harbor, Oregon, USA). To elucidate potential genetic differences between the two populations, three segments of nuclear ribosomal DNA (rDNA), the ITS1-ITS2 regions and the 18S and 28S rDNA genes were cloned and sequenced. Variant sequences within each individual were found in all three rDNA segments. Folding models were utilized for secondary structure analysis and results indicated that there were many sequences which contained structure-altering polymorphisms, which suggests they could be nonfunctional pseudogenes. In addition, analysis of molecular variance (AMOVA) was used for hierarchical analysis of genetic variance to estimate variation within and among populations and within individuals. AMOVA revealed significant variation in the ITS region between the populations and among clones within individuals, while in the 5.8S rDNA significant variation was revealed among individuals within the two populations. High levels of intragenomic variation were found in the ITS regions, which are known to be highly variable in many organisms. More interestingly, intragenomic variation was also found in the 18S and 28S rDNA, which has rarely been observed in animals and is so far unreported in Mollusca. We postulate that in these P. antipodarum populations the effects of concerted evolution are diminished due to the fact that not all of the rDNA genes in their polyploid genome should be essential for sustaining cellular function. This could lead to a lessening of selection pressures, allowing mutations to accumulate in some copies, changing them into variant sequences.                   

  9. Variation of b and p values from aftershocks sequences along the Mexican subduction zone and their relation to plate characteristics

    NASA Astrophysics Data System (ADS)

    Ávila-Barrientos, L.; Zúñiga, F. R.; Rodríguez-Pérez, Q.; Guzmán-Speziale, M.

    2015-11-01

    Aftershock sequences along the Mexican subduction margin (between coordinates 110ºW and 91ºW) were analyzed by means of the p value from the Omori-Utsu relation and the b value from the Gutenberg-Richter relation. We focused on recent medium to large (Mw > 5.6) events considered susceptible of generating aftershock sequences suitable for analysis. The main goal was to try to find a possible correlation between aftershock parameters and plate characteristics, such as displacement rate, age and segmentation. The subduction regime of Mexico is one of the most active regions of the world with a high frequency of occurrence of medium to large events and plate characteristics change along the subduction margin. Previous studies have observed differences in seismic source characteristics at the subduction regime, which may indicate a difference in rheology and possible segmentation. The results of the analysis of the aftershock sequences indicate a slight tendency for p values to decrease from west to east with increasing of plate age although a statistical significance is undermined by the small number of aftershocks in the sequences, a particular feature distinctive of the region as compared to other world subduction regimes. The b values show an opposite, increasing trend towards the east even though the statistical significance is not enough to warrant the validation of such a trend. A linear regression between both parameters provides additional support for the inverse relation. Moreover, we calculated the seismic coupling coefficient, showing a direct relation with the p and b values. While we cannot undoubtedly confirm the hypothesis that aftershock generation depends on certain tectonic characteristics (age, thickness, temperature), our results do not reject it thus encouraging further study into this question.

  10. Hfqs in Bacillus anthracis: Role of protein sequence variation in the structure and function of proteins in the Hfq family.

    PubMed

    Vrentas, Catherine; Ghirlando, Rodolfo; Keefer, Andrea; Hu, Zonglin; Tomczak, Aurelie; Gittis, Apostolos G; Murthi, Athulaprabha; Garboczi, David N; Gottesman, Susan; Leppla, Stephen H

    2015-11-01

    Hfq proteins in Gram-negative bacteria play important roles in bacterial physiology and virulence, mediated by binding of the Hfq hexamer to small RNAs and/or mRNAs to post-transcriptionally regulate gene expression. However, the physiological role of Hfqs in Gram-positive bacteria is less clear. Bacillus anthracis, the causative agent of anthrax, uniquely expresses three distinct Hfq proteins, two from the chromosome (Hfq1, Hfq2) and one from its pXO1 virulence plasmid (Hfq3). The protein sequences of Hfq1 and 3 are evolutionarily distinct from those of Hfq2 and of Hfqs found in other Bacilli. Here, the quaternary structure of each B. anthracis Hfq protein, as produced heterologously in Escherichia coli, was characterized. While Hfq2 adopts the expected hexamer structure, Hfq1 does not form similarly stable hexamers in vitro. The impact on the monomer-hexamer equilibrium of varying Hfq C-terminal tail length and other sequence differences among the Hfqs was examined, and a sequence region of the Hfq proteins that was involved in hexamer formation was identified. It was found that, in addition to the distinct higher-order structures of the Hfq homologs, they give rise to different phenotypes. Hfq1 has a disruptive effect on the function of E. coli Hfq in vivo, while Hfq3 expression at high levels is toxic to E. coli but also partially complements Hfq function in E. coli. These results set the stage for future studies of the roles of these proteins in B. anthracis physiology and for the identification of sequence determinants of phenotypic complementation.

  11. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

    PubMed

    Gu, W; Crawford, E D; O'Donovan, B D; Wilson, M R; Chow, E D; Retallack, H; DeRisi, J L

    2016-03-04

    Next-generation sequencing has generated a need for a broadly applicable method to remove unwanted high-abundance species prior to sequencing. We introduce DASH (Depletion of Abundant Sequences by Hybridization). Sequencing libraries are 'DASHed' with recombinant Cas9 protein complexed with a library of guide RNAs targeting unwanted species for cleavage, thus preventing them from consuming sequencing space. We demonstrate a more than 99 % reduction of mitochondrial rRNA in HeLa cells, and enrichment of pathogen sequences in patient samples. We also demonstrate an application of DASH in cancer. This simple method can be adapted for any sample type and increases sequencing yield without additional cost.

  12. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications.

    PubMed

    Gu, W; Crawford, E D; O'Donovan, B D; Wilson, M R; Chow, E D; Retallack, H; DeRisi, J L

    2016-01-01

    Next-generation sequencing has generated a need for a broadly applicable method to remove unwanted high-abundance species prior to sequencing. We introduce DASH (Depletion of Abundant Sequences by Hybridization). Sequencing libraries are 'DASHed' with recombinant Cas9 protein complexed with a library of guide RNAs targeting unwanted species for cleavage, thus preventing them from consuming sequencing space. We demonstrate a more than 99 % reduction of mitochondrial rRNA in HeLa cells, and enrichment of pathogen sequences in patient samples. We also demonstrate an application of DASH in cancer. This simple method can be adapted for any sample type and increases sequencing yield without additional cost. PMID:26944702

  13. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6*15 and *35 Genotyping

    PubMed Central

    Riffel, Amanda K.; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C.; Leeder, J. Steven; Rosenblatt, Kevin P.; Gaedigk, Andrea

    2016-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6*15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6*15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6*35) which is also located in exon 1. Although alternative CYP2D6*15 and *35 assays resolved the issue, we discovered a novel CYP2D6*15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6*15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6*43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer and/or probe regions can impact

  14. CYP2D7 Sequence Variation Interferes with TaqMan CYP2D6 (*) 15 and (*) 35 Genotyping.

    PubMed

    Riffel, Amanda K; Dehghani, Mehdi; Hartshorne, Toinette; Floyd, Kristen C; Leeder, J Steven; Rosenblatt, Kevin P; Gaedigk, Andrea

    2015-01-01

    TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs) some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false-positive CYP2D6 (*) 15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6 (*) 15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL)-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6 (*) 35) which is also located in exon 1. Although alternative CYP2D6 (*) 15 and (*) 35 assays resolved the issue, we discovered a novel CYP2D6 (*) 15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6 (*) 15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696) SNP of CYP2D6 (*) 43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer

  15. Sequence variation in the coding region of the melanocortin-1 receptor gene (MC1R) is not associated with plumage variation in the blue-crowned manakin (Lepidothrix coronata).

    PubMed

    Cheviron, Z A; Hackett, Shannon J; Brumfield, Robb T

    2006-07-01

    Avian plumage traits are the targets of both natural and sexual selection. Consequently, genetic changes resulting in plumage variation among closely related taxa might represent important evolutionary events. The molecular basis of such differences, however, is unknown in most cases. Sequence